Home | A Day in a Pile of Work

Starting a Consulting Company

Here we go. I just registered my new venture, a sole proprietorship named “Guillaume Poirier-Morency Consulting Co.”. It will focus on providing consulting services in my areas of expertise such as computer science, programming, bioinformatics, and statistical modelling.

I have a few core ideas:

Apply my expertise to help out people make the most out of computer science in their projects. That could range from better understanding and explaining the challenges underlying their scientific questions to building working prototypes and solutions.
Use cloud services to power solutions and avoid as much as possible the burdens of maintenance.
Provide solutions for customers and release as much as possible under free software licenses.
Use free software for business operations (i.e. GNU Cash, LibreOffice, etc.).

It’s also important for me to avoid competing interests with my work and make sure that I preserve an overall sane work-life balance. To that end, I will pick projects carefully.

I might also deviate a bit into uncharted territories such as personal income tax. I’ve been helping friends and family members recently with their tax returns, and I know a fair bit about the available deductions, credits, etc. to help out people.

No logo, graphics, or website yet. Just an idea. That’s it.

Zero-copy FASTA Parser

Where I work, we deal often deal with large datasets from which we copy the relevant entries into the program memory. However, doing so typically incurs a very large usage of memory, which could leads to memory-bound parallelism if multiple instances are launched.

The memory-bound parallelism issue is arising when a system cannot execute more tasks due to a lack of available memory. It is essentially wasting of all other available resources such as CPU time.

To address this kind of issue, I’ll describe in this post a strategy using memory-mapped files and on-demand processing over a very common data format in bioinformatics: FASTA. The use case is pretty simple: we want to query small and arbitrary subsequences without having to precondition them in allocated memory.

About Virtual Memory Space

The virtual address space is large, very. Think of all the addresse values a 64 bit pointer can take. That’s about 18 quitillions of addressable bytes, which is enough to never be bothered with.

Understandbly, no computer can hold that much of memory. Instead, the operating system partitions the virtual memory into pages and the physical memory into frames. It uses a cache algorithm and load addressed pages into physical frames. Unused pages are stored on disk, in the available swap partitions or compressed into physical memory if you use Zswap¹.

The mmap² system call makes a correspondance between a file and pages in virtual memory. Addressing the memory where the file has been mapped will result in the kernel fetching its content dynamically. Moreover, if multiple processes map the same file, the same frames (i.e. physical memory) will be used across all of them.

void * mmap (void *addr,
             size_t length,
             int prot,
             int flags,
             int fd,
             off_t offset);

Where addr hint the operating system for a memory location, length indicates the size of the mapping, prot indicates permissions on the region, flags holds various options, fd is a file descriptor and offset is a byte offset from the file content. The returned value is the mapped address.

We can use this feature at our advantage by loading our data once and transparently share them across all the instances of our program.

I’m using GLib, a portable C library, and its providen GMappedFile to carefully wrap mmap with reference counting.

g_auto (GMappedFile) fasta_map = g_mapped_file_new ("hg38.fa",
                                                    false);

Our Use Case

To be more specific our use case only require to view small windows (~7 nucleotides) of the sequence at once. If we assume 80 nucleotides per line, we have 80 possible windows from which 73 are free of newlines. The probability for a random subsequence of length 7 of landing on a newline is thus approximately 8.75%.

For the great majority of cases, assuming uniformly distributed subsequence requests, we can simply return the address from the mapped memory.

From now on, we assume that the in-memory mapped document has already been indexed by bookeeping the beginnings of each sequences, which can be easily done with memchr³. The sequence pointer points to some start of a sequence and sequence_len indicate the length before the next one.

To work efficiently, it is worth to index the newlines. For this purpose, we use a GPtrArray, which is a simple pointer array implementation that we populate with the addresses of the newlines in the mapped buffer.

const gchar *sequence = "ACTG\nACTG";
gsize sequence_len    = 9;

g_autoptr (GPtrArray) sequence_skips =
    g_ptr_array_sized_new (sequence_len / 80); // line feed every 80 characters

const gchar* seq = sequence;
while ((seq = memchr (seq, '\n', sequence_len - (seq - sequence))))
{
    g_ptr_array_add (sequence_skips, (gpointer) seq);
    seq++; // jump right after the line feed
}

A newline can either preceed, follow or land within the subsequence.

all thoses preceeding the desired subsquence shifts the sequence to the right
all those within the subsequence must be stripped
the remaining newlines can be safely ignored

If only the first or last condition apply, we’re in the 92.5% of the cases as we can simply return the corresponding memory address.

gsize subsequence_offset = 1;
gsize subsequence_len = 7;

We first position our subsequence at its initial location.

const gchar *subsequence = sequence + subsequence_offset;

We need some bookkeeping for filling a fixed-width buffer if a newline land within our subsequence.

static gchar subsequence_buffer[64];
gsize subsequence_buffer_offset = 0;

Now, for each linefeed we’ve collected, we’re going to test our three conditions and either move the subsequence right or fill the static buffer.

The second condition require some work. Using the indexed newlines, we basically trim the sequence into a static buffer that is returned. Although we lose thread safety working this way, it will be mitigated by process-level parallelism.

gint i;
for (i = 0; i < sequence_skips->len; i++)
{
    const gchar *linefeed = g_ptr_array_index (sequence_skips, i);
    if (linefeed <= subsequence)
    {
        subsequence++; // move the subsequence right
    }
    else if (linefeed < subsequence + subsequence_len)
    {
        // length until the next linefeed
        gsize len_to_copy = linefeed - subsequence;

        memcpy (subsequence_buffer + subsequence_buffer_offset,
                subsequence,
                len_to_copy);

        subsequence_buffer_offset += len_to_copy;
        subsequence += len_to_copy + 1; // jump right after the linefeed
    }
    else
    {
        break; // linefeed supersedes the subsequence
    }
}

Lastly we check whether or not we’ve used the static buffer, in which case we copy any trailing sequence.

if (subsequence_buffer_offset > 0)
{
    if (subsequence_buffer_offset < subsequence_len)
    {
        memcpy (subsequence_buffer + subsequence_buffer_offset,
                subsequence,
                subsequence_len - subsequence_buffer_offset);
    }

    return subsequence_buffer;
}
else
{
    return subsequence;
}

It’s possible to use a binary search strategy to obtain the range of newlines affecting the position of the requested subsequence, but since the number of newlines is considerably small, I ignored this optimization so far.

Here we are with our zero-copy FASTA parser that efficiently look for small subsequences.

P.S.: This technique has been used for the C rewrite of miRBooking⁴ I’ve been working on these past weeks.

Valadoc.org Rewrite and More! in Valum

The rewrite of valadoc.org in Vala using Valum has been completed and should be deployed eventually be elementary OS team (see pull #40). There’s a couple of interesting stuff there too:

experimental search API using JSON via the /search endpoint
GLruCache now has Vala bindings and an improved API
an eventual GMysql wrapper around the C client API if extracting the classes I wrote is worth it

In the meantime, you can test it at valadoc2.elementary.io and report any regression on the pull-request.

Valum 0.3 has been patched and improved while I have been working on the 0.4 feature set. There’s a work-in-progress WebSocket middleware, VSGI 1.0 and support for PyGObject planned.

If everything goes as planned, I should finish the AJP backend and maybe consider Lwan.

On the top, there’s Windows support coming, although the most difficult part is to test it. I might need some help there to setup AppVeyor CI.

I’m aware of the harsh discussions about the state of Vala and whether or not it will just end into an abysmal void. I would advocate inertia here: the current state of the language still make it an excelllent candidate for writing GNOME-related software and this is not expected to change.

Announcing Valum 0.3 in Valum

The first release candidate for Valum 0.3 has been launched today!

Get it, test it and be the first to find a bug! The final release will come shortly after along with various Linux distributions packages.

This post review the changes and features that have been introduced since the 0.2. There’s been a lot of work, so take a comfortable seat and brew yourself a strong coffee.

The most significant change has probably been the introduction of Meson as a build system and all the new deployment strategy it now makes possible.

If you prefer avoiding a full install, it’s not possible to use it as a subproject. These are defined as subdirectories of subprojects, which you can conveniently track using git submodules.

project('', 'c', 'vala')

glib = dependency('glib-2.0')
gobject = dependency('gobject-2.0')
gio = dependency('gio-2.0')
soup = dependency('libsoup-2.4')
vsgi = subproject('valum').get_variable('vsgi')
valum = subproject('valum').get_variable('valum')

executable('app', 'app.vala',
           dependencies: [glib, gobject, gio, soup, vsgi, valum])

Once installed, however, all that is needed is to pass --pkg=valum-0.3 to the Vala compiler.

vala --pkg=valum-0.3 app.vala

In app.vala,

using Valum;
using VSGI;

public int main (string[] args) {
    var app = new Router ();

    app.get ("/", (req, res) => {
        return res.expand_utf8 ("Hello world!");
    });

    return Server.@new ("http", handler: app)
                 .run (args);
}

There’s been a lot of new features and I hope I won’t miss any!

There’s a new url_for utility in Router that comes with named route. It basically allow one to reverse URLs patterns defined with rules and raw paths.

All that is needed is to pass a name to rule, path or any method helper.

I discovered the : notation for named varidic arguments if they alternate between strings and values. This is typically used to initialize GLib.Object.

using Valum;
using VSGI;

var app = new Router ();

app.get ("/", (req, res) => {
    return "<a href=\"%s\">View profile of %s</a>".printf (
        app.url_for ("user", id: "5"), "John Doe");
});

app.get ("/users/<int:id>", (req, res, next, ctx) => {
    var id = ctx["id"].get_string ();
    return res.expand_utf8 ("Hello %s!".printf (id));
}, "user");

In Router, we also have:

asterisk to handle * URI
once for performing initialization
path for a path-based route
rule to replace method
register_type rather than a GLib.HashTable<string, Regex>

Another significant change is that the previous state stack has been replaced by a context tree with recursive key resolution. It pretty much maps string to GLib.Value in a non-destructive way.

In terms of new middlewares, you’ll be glad to see all the built-in functionnalities we now support:

authentication with support for the Basic scheme via authenticate
content negotiation via negotiate, accept and more!
static resource delivery from GLib.File and GLib.Resource bundles
basic to strip the Router responsibilities
subdomain
basepath to prefix URLs
cache_control to set the Cache-Control header
branch on raised status codes
perform work safely and don’t let any error leak!
stream events with stream_events

Now, which one to cover?

The basepath is my personal favourite, because it allow one to create prefix-agnostic routers.

var app = new Router ();
var api = new Router ();

// matches '/api/v1/'
api.get ("/", (req, res) => {
    return res.expand_utf8 ("Hello world!");
});

app.use (basepath ("/api/v1", api.handle));

The only missing feature is to retranslate URLs directly from the body. I think we could use some GLib.Converter here.

The negotiate middleware and it’s derivatives are really handy for declaring the available representations of a resource.

app.get ("/", accept ("text/html; text/plain", (req, res, next, ctx, ct) => {
    switch (ct) {
        case "text/html":
            return res.expand_utf8 ("");
        case "text/plain":
            return "Hello world!";
        default:
            assert_not_reached ();
    }
}))

There’s a lot of stuff happening in each of them so refer to the docs!

Quick review into Request and Response, we now have the following helpers:

lookup_query to fetch a query item and deal with its null case
lookup_cookie and lookup_signed_cookie to fetch a cookie
cookies to get cookies from a request and response
convert to apply a GLib.Converter
append to append a chunk into the response body
expand to write a buffer into the response body
expand_stream to pipe a stream
expand_file to pipe a file
end to end a response properly
tee to tee the response body into an additional stream

All the utilities to write the body come in _bytes and _utf8 variants. The latter properly set the content charset when appliable.

Back into Server, implementation have been modularized with GLib.Module and are now dynamically loaded. What used to be a VSGI.<server> namespace now has become simply Server.new ("<name>"). Implementations are installed in ${prefix}/${libdir}/vsgi-0.3/servers, which can be overwritten by the VSGI_SERVER_PATH environment variable.

The VSGI specification is not yet 1.0, so please, don’t write a custom server for now or if you do so, please submit it for inclusion. There’s some work-in-progress for Lwan and AJP as I speak if you have some time to spend.

Options have been moved into GLib.Object properties and a new listen API based on GLib.SocketAddress makes it more convenient than ever.

using VSGI;

var tls_cert = new TlsCertificate.from_files ("localhost.cert",
                                              "localhost.key");
var http_server = Server.new ("http", https: true,
                                      tls_certificate: tls_cert);

http_server.set_application_callback ((req, res) => {
    return res.expand_utf8 ("Hello world!");
});

http_server.listen (new InetSocketAddress (new InetAddress.loopback (SocketFamily.IPV4), 3003));

new MainLoop ().run ();

The GLib.Application code has been extracted into the new VSGI.Application cushion used when calling run. It parses the CLI, set the logger and handle SIGTERM into a graceful shutdown.

Server can also fork to scale on multicore architectures. I’ve backtracked on the Worker class to deal with IPC communication, but if anyone is interested into building a nice clustering system, I would be glad to look into it.

That wraps it up, the rest can be discovered in the updated docs. The API docs should be available shortly via valadoc.org.

I manage to cover this exhaustively with abidiff, a really nice tool to diff two ELF files.

Long-term notes

Here’s some long-term notes for things I couldn’t put into this release or that I plan at a much longer term.

multipart streams
digest authentication
async delegates
epoll and kqueue with wip/pollcore
schedule future release with the GNOME project
GIR introspection and typelibs for PyGObject and Gjs

The GIR and typelibs stuff might not be suitable for Valum, but VSGI could have a bright future with Python or JavaScript bindings.

Coming releases will be much less time-consuming as there’s been a big step to make to have something actually usable. Maybe every 6 months or so.

What is Meson?

I have discovered Meson a couple of years back and since then use it for most of my projects written in Vala. This post is an attempt at describing the good, bad and ugly of the build system.

So, what is Meson?

a build system
portable (see Python portability)
a Ninja generator
use case oriented
fast
opiniated

What it’s not?

a general purpose build system
a Turing-complete language
extensible (only in Python)

It handle 80% of the cases nicely and elegantly.

Since it is use case oriented, features are introduced on need. It keeps a tight balance between conciseness, generality and features.

It mixes configure and build step so that the build essentially become one big tree. Then, the build system determine what goes into the configuration and what goes into the build.

The cognitive load is very low, which means it’s very easy to learn the basics and make actual usage of it. This is critical, because all the time spent on setting the build hardly contribute to the project goal.

The following is a basic build that check for dependencies (using pkg-config) and build an executable:

project('Meson Example', 'c', 'vala')

glib = dependency('glib-2.0')
gobject = dependency('gobject-2.0')

executable('app', 'app.vala', dependencies: [glib, gobject])

Building becomes a piece of cake:

mkdir build && cd build
meson ..
ninja

Only a few keywords are sufficient for most builds:

executable
library with shared_library and static_library
dependency
declare_dependency

Built-in benchmarks and tests, just pass the executable to either benchmark or test.

The main downside is that if what you want to do is not supported, you either have to hack things or wait until the feature gets into the build system.

The system is very opiniated. It’s both a good and bad thing. Good since you don’t need to write a lot to get most jobs done. Bad because you might hit a wall eventually.

There’s also the Python question. It requires at least 3.4. This is becoming less an problematic as old distributions progressively die out, but still can prevent you now. Here’s a few ideas to remedy this problem:

build a dependency-free zipball (see issue #588)
backport Meson to older Python version

Meson is getting better over time and so far has managed to become the best build system for Vala. This is why I highly recommend it.

Merged GModule branch! in Valum

Valum now support dynamically loadable server implementation with GModule!

Server are typically looked up in /usr/lib64/vsgi/servers with the libvsgi-<name>.so pattern (although this is highly system-dependent).

This works by setting the RPATH to $ORIGIN/vsgi/servers of the VSGI shared library so that it looks into that folder first.

The VSGI_SERVER_PATH environment variable can be set as well to explicitly provide a directory containing implementations.

To implement a compliant VSGI server, all you need is a server_init symbol which complies with ServerInitFunc delegate like the following:

[ModuleInit]
public Type server_init (TypeModule type_module) {
    return typeof (VSGI.Custom.Server);
}

public class VSGI.Custom.Server : VSGI.Server {
    // ...
}

It has to return a type that is derived from VSGI.Server and instantiable with GLib.Object.new. The Vala compiler will automatically generate the code to register class and interfaces into the type_module parameter.

Some code from CGI has been moved into VSGI to provide uniform handling of its environment variables. If the protocol you want complies with that, just subclass (or directly use) VSGI.CGI.Request and it will perform all the required initialization.

public class VSGI.Custom.Request : VSGI.CGI.Request {
    public Request (IOStream connection, string[] environment) {
        base (connection, environment);
    }
}

For more flexibility, servers can be loaded with ServerModule directly, allowing one to specify an explicit lookup directory and control when the module should be loaded or unloaded.

var cgi_module = new ServerModule (null, "cgi");

if (!cgi_module.load ()) {
    assert_not_reached ();
}

var server = Object.new (cgi_module.server_type);

I received very useful support from Nirbheek Chauhan and Tim-Philipp Müller for setting the necessary build configuration for that feature.

Content Negotiation in Valum

I recently finished and merged support for content negotiation.

The implementation is really simple: one provide a header, a string describing expecations and a callback invoked with the negotiated representation. If no expectation is met, a 406 Not Acceptable is raised.

app.get ("/", negotiate ("Accept", "text/xml; application/json",
                         (req, res, next, ctx, content_type) => {
    // produce according to 'content_type'
}));

Content negotiation is a nice feature of the HTTP protocol allowing a client and a server to negotiate the representation (eg. content type, language, encoding) of a resource.

One very nice part allows the user agent to state a preference and the server to express quality for a given representation. This is done by specifying the q parameter and the negotiation process attempt to maximize the product of both values.

The following example express that the XML version is poor quality, which is typically the case when it’s not the source document. JSON would be favoured – implicitly q=1 – if the client does not state any particular preference.

accept ("text/xml; q=0.1, application/json", () => {

});

Mounted as a top-level middleware, it provide a nice way of setting a Content-Type: text/html; charset=UTF-8 header and filter out non-compliant clients.

using Tmpl;
using Valum;

var app = new Router ();

app.use (accept ("text/html", () => {
    return next ();
}));

app.use (accept_charset ("UTF-8", () => {
    return next ();
}));

var home = new Template.from_path ("templates/home.html");

app.get ("/", (req, res) => {
    home.expand (res.body, null);
});

This is another step forward a 0.3 release!

Fork! in Valum

Ever heard of fork?

using GLib;
using VSGI.HTTP;

var server = new Server ("", () => {
    return res.expand_utf8 ("Hello world!");
});

server.listen (new VariantDict ().end ());
server.fork ();

new MainLoop ().run ();

Yeah, there’s a new API for listening and forking with custom options…

The fork system call will actually copy the whole process into a new process, running the exact same program.

Although memory is not shared, file descriptors are, so you can have workers listening on common interfaces.

I notably tested the whole thing on our cluster at IRIC. It’s a 64 cores Xeon Core i7 setup.

wrk -c 1024 -t 32 http://0.0.0.0:3003/hello

With a single worker:

Running 10s test @ http://0.0.0.0:3003/hello
  32 threads and 1024 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    54.35ms   95.96ms   1.93s    98.78%
    Req/Sec   165.81    228.28     2.04k    86.08%
  41741 requests in 10.10s, 5.89MB read
  Socket errors: connect 35, read 0, write 0, timeout 13
Requests/sec:   4132.53
Transfer/sec:    597.28KB

With 63 forks (64 workers):

Running 10s test @ http://0.0.0.0:3003/hello
  32 threads and 1024 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    60.83ms  210.70ms   2.00s    93.58%
    Req/Sec     2.99k   797.97     7.44k    70.33%
  956577 requests in 10.10s, 135.02MB read
  Socket errors: connect 35, read 0, write 0, timeout 17
Requests/sec:  94720.20
Transfer/sec:     13.37MB

It’s about 1500 req/sec per worker and an speedup of a factor of 23. The latency is almost not affected.

Work on Memcached-GLib

The past few days, I’ve been working on a really nice libmemcached GLib wrapper.

main loop integration
fully asynchronous API
error handling

The whole code is available under the LGPLv3 from arteymix/libmemcached-glib.

It should reach 1.0 very quickly, only a few features are missing:

a couple of function wrappers
integration for libmemcachedutil
async I/O improvements

Once released, it might be interesting to build a GTK UI for Memcached upon that work. Meanwhile, it will be a very useful tool to build fast web applications with Valum.

Proposal for asynchronous delegates in Vala

This post describe a feature I will attempt to implement this summer.

The declaration of async delegate is simply extending a traditional delegate with the async trait.

public async delegate void AsyncDelegate (GLib.OutputStream @out);

The syntax of callback is the same. It’s not necessary to add anything since the async trait is infered from the type of the variable holding it.

AsyncDelegate d = (@out) => {
    yield @out.write_all_async ("Hello world!".data, null);
}

Just like regular callback, asynchronous callbacks are first-class citizen.

public async void test_async (AsyncDelegate callback,
                              OutputStream  @out) {
    yield callback (@out);
}

It’s also possible to pass an asynchronous function which is type-compatible with the delegate signature:

public async void hello_world_async (OutputStream @out)
{
    yield @out.write_all_async ("Hello world!".data);
}

yield test_async (hello_world_async, @out);

Chaining

I still need to figure out how to handle chaining for async lambda. Here’s a few ideas:

refer to the callback using this (weird..)
introduce a callback keyword

AsyncDelegate d = (@out) => {
    Idle.add (this.callback);
    yield;
};

AsyncDelegate d = (@out) => {
    Idle.add (callback);
    yield;
};

How it would end-up for Valum

Most of the framework could be revamped with the async trait in ApplicationCallback, HandlerCallback and NextCallback.

app.@get ("/me", (req, res, next) => {
    if (req.lookup_signed_cookies ("session") == null) {
        return yield next (req, res);
    }
    return yield res.extend_utf8_async ("Hello world!".data);
});

The semantic for the return value would simply state if the request has been handled instead of being eventually handled.