A Day in a Pile of Work

My personal Web development blog

Basepath in Valum

I have recently introduced a basepath middleware and I thought it would be relevant to describe it further.

It’s been possible, since a while, to compose routers using subrouting. This is very important to write modular applications.

var app = new Router ();
var user = new Router ();

user.get ("/user/<int:id>", (req, res, next, ctx) => {
    var id = ctx["id"] as string;
    var user = new User.from_id (id);
    res.extend_utf8 ("Welcome %s", user.username);
});

app.rule ("/user", user.handle);

Now, using basepath, it’s possible to design the user router without specifying the /user prefix on rules.

This is very important, because we want to be able to design the user router as if it were the root and rebase it on need upon any prefix.

var app = new Router ();
var user = new Router ();

user.get ("/<int:id>", (req, res) => {
    res.extend_utf8 ("Welcome %s".printf (ctx["id"].get_string ()))
});

app.use (basepath ("/user", user.handle));

How it works

When passing through the basepath middleware, request which have a prefix-match with the basepath are stripped and forwarded.

But there’s more!

That’s not all! The middleware also handle errors that set the Location header from Success.CREATED and Redirection.* domains.

user.post ("/", (req, res) => {
    throw new Success.CREATED ("/%d", 5); // rewritten as '/user/5'
});

It also rewrite the Location header if it was set directly.

user.post ("/", (req, res) => {
    res.status = Soup.Status.CREATED;
    res.headers.replace ("Location", "/%d".printf (5));
});

Rewritting the Location header is exclusively applied on absolute paths starting with a leading slash /.

It can easily be combined with the subdomain middleware to provide a path-based fallback:

app.subdomain ("api", api.handle);
app.use (basepath ("/api/v1", api.handle));

Posted on .

Just reached 6.3k req/sec in Valum

I often profile Valum’s performance with wrk to ensure that no regression hit the stable release.

It helped me identifying a couple of mistakes n various implementations.

Anyway, I’m glad to announce that I have reached 6.3k req/sec on small payload, all relative to my very lowgrade Acer C720.

The improvements are available in the 0.2.14 release.

  • wrk with 2 threads and 256 connections running for one minute
  • Lighttpd spawning 4 SCGI instances

Build Valum with examples and run the SCGI sample:

./waf configure build --enable-examples
lighttpd -D -f examples/scgi/lighttpd.conf

Start wrk

wrk -c 256 http://127.0.0.1:3003/

Enjoy!

Running 1m test @ http://127.0.0.1:3003/
  2 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    40.26ms   11.38ms 152.48ms   71.01%
    Req/Sec     3.20k   366.11     4.47k    73.67%
  381906 requests in 1.00m, 54.31MB read
Requests/sec:   6360.45
Transfer/sec:      0.90MB

There’s still a few things to get done:

  • hanging connections benchmark
  • throughput benchmark
  • logarithmic routing #144

The trunk buffers SCGI requests asynchronously, which should improve the concurrency with blocking clients.

Lighttpd is not really suited for throughput because it buffers the whole response. Sending a lot of data is problematic and use up a lot of memory.

Valum is designed with streaming in mind, so it has a very low (if not neglectable) memory trace.

I reached 6.5k req/sec, but since I could not reliably reproduce it, I prefered posting these results.

Posted on .

Progress Update in Valum

My up key stopped working, so I’m kind of forced into vim motions.

All warnings have been fixed and I’m looking forward enforcing --experimental-non-null as well.

The response head is automatically written on disposal and the body is not closed explicitly when a status is thrown.

In the mean time, I managed to backport write-on-disposal into the 0.2.12 hotfix.

I have written a formal grammar and working on an implementation that will be used to reverse rules.

VSGI Redesign

There’s some work on a slight redesign of VSGI where we would allow the ApplicationCallback to return a value. It would simplify the call of a next continuation:

app.get ("", (req, res, next) => {
    if (some_condition)
        return next (req, res);
    return res.body.write_all ("Hello world!".data, null);
});

In short, a boolean returned tells if the request is or will eventually be handled.

The only thing left is to decide what the server will do about not handled requests.

Update (Feb 21, 2016)

This work has been merged and it’s really great because it provides major improvements:

  • no more 404 at the bottom of perform_routing, we use the return value to determine if any route has matched
  • OPTIONS work even if no route has matched: a 404 Not Found would be thrown with the previous approach

Yeah, we are even handling OPTIONS! It produce a 0-length body with the Allow header assigned with a list of available methods for the resource.

Expand!

The Response class now has a expand and expand_utf8 methods that work similarly to flatten for Request.

app.get ("", (req, res) => {
    return res.expand_utf8 ("Hello world!");
});

It will deal with writting the head, piping the passed buffer and close the response stream properly.

The asynchronous versions are provided if gio (>=2.44) is available during the build.

SCGI improvements

Everything is not buffered in a single step and resized on need if the request body happen not to hold in the default 4kiB buffer.

I noticed that set_buffer_size literally allocate and copy over data, so we avoid that!

I have also worked on some defensive programming to cover more cases of failure with the SCGI protocol:

  • encoded lengths are parsed with int64.try_parse, which prevented SEGFAULT
  • a missing CONTENT_LENGTH environment variable is properly handled

I noticed that SocketListener also listen on IPv6 if available, so the SCGI implementation has a touch of modernity! This is not available (yet) for FastCGI.

Right now, I’m working on supporting UNIX domain socket for SCGI and libsoup-2.4 implementations.

It’s rolling at 6k req/sec behind Lighttpd on my shitty Acer C720, so enjoy!

I have also fixed errors with the FastCGI implementation: it was a kind of major issue in the Vala language. In fact, it’s not possible to return a code and throw an exception simultaneously, which led to an inconsistent return value in OutputStream.write.

To temporairly fix that, I had to supress the error and return -1. I’ll have to hack this out eventually.

In short, I managed to make VSGI more reliable under heavy load, which is a very good thing.

Posted on .

v0.2.9 Released! in Valum

I have just backported important fixes from the latest developments in this hotfix release.

  • fix blocking accept call
  • async I/O with FastCGI with UnixInputStream and UnixOutputStream
  • backlog defaults to 10

The blocking accept call was a real pain to work around, but I finally ended up with an elegant solution:

  • use a threaded loop for accepting a new request
  • delegate the processing into the main context

FastCGI mutiplexes multiple requests on a single connection and thus, it’s hard to perform efficient asynchronous I/O. The only thing we can do is polling the unique file descriptor we have and to do it correctly, why not reusing gio-unix-2.0?

The streams are reimplemented by deriving UnixInputStream and UnixOutputStream and overriding read and write to write a record instead of the raw data. That’s it!

I have also been working on SCGI: the netstring processing is now fully asynchronous. I couldn’t backport it as it was depending on other breaking changes.

Posted on .

Roadmap for the 0.3 series in Valum

The 0.2 series has focused on bringing the basics that will guide the upcoming features.

Here’s the roadmap for the next release 0.3 of Valum:

  • aggressive optimization based on GLib.Sequence
  • content negociation
  • HTTP authentification (basic and digest)
  • static resource delivery
  • VSGI loader
  • multipart streams
  • typed rule parameters using GType
  • filters
  • Python and Gjs bindings

The following features have been integrated in the trunk:

  • flags for HTTP methods
  • default HEAD to GET (still need to strip the body)
  • routing context rather than a stack that provide states and services
  • shared libraries for VSGI implementations compatible with GLib.TypeModule
  • Router.asterisk to handle asterisk URI *
  • inheritence for Route to split concerns
  • less responsibilities in Route subclasses to encapsulate scopes, types and other features in Router
  • register_type API for the incoming typed parameters

Status handlers has been reworked and cleaned up to cover more cases using a switch block.

  • status handlers are executed in the Router context rather than a double try-catch for consistent behaviours
  • the error message is only used for headers that MUST be part of the response, except for redirection codes

Aggressive Optimizations

Okay, let me calm down. The idea is to bring routing in the O(log n) world by using a GLib.Sequence which consist of a binary tree data structure.

Basically, we have a sequence of Route objects and we try to find in the least number of attempts the next one that accepts a given request.

In short, here’s what should be done:

  • sorting by exclusive criteria (method, version, …)
  • sorting by usage
  • pre-lookup using a trie-based index

I still need to figure out more, it’s all in issue #144.

Middlewares

With the common ground set, this series will bring useful middlewares to process Request and Response efficiently.

To avoid confusion about what a middleware is, I decided to apply the term only to HandlerCallback instances.

Content Negociation

Content negociation is implemented as a set of middlewares which check the request, set the appropriate headers and forward the processing.

If the produced resource is not acceptable, next is called unless NegociateFlags.FINAL is specified. Then, a 406 Not Acceptable is raised.

app.get ("", accept ("text/html", (req, res) => {
    res.body.write_all ("<!DOCTYPE html><html>Hello world!</html>");
}));

app.get ("", accept ("text/plain", (req, res) => {
    res.body.write_all ("Hello world!");
}, NegociateFlags.FINAL));

The latests improvements are awaiting tests.

Static Resources Delivery

Static resources can be served from a path or a resource bundle. It support multiple options:

  • ETag which identify the resource uniquely to prevent transmission
  • Last-Modified (only for path)
  • X-Sendfile if the HTTP server supports it
  • mark the delivered resource as public for caches
  • deliver asynchronously

Delivery from GLib.Resource defaults on the global resources.

The only requirement is to provide a path key in the routing context, which can be easily done with a rule or a regular expression:

using Valum.Static;

app.get ("<path:path>", serve_from_resources (ServeFlags.ENABLE_ETAG));

It’s living here in #143.

Flags for HTTP methods

This is a really nice feature.

HTTP methods are now hanlded as flags in the Router to perform very efficient match.

Standard methods are available in Method enumeration along with the following symbols:

  • ALL to capture all standard method
  • OTHER to capture non-standard method
  • ANY to capture any method

If OTHER is specified, it must be implemented in the matching callback.

all and methods have been removed from Router for obvious reasons and method has been renamed to rule to remain consistent.

app.rule (Method.GET | Method.POST, "", () => {

});

app.rule (Method.ALL, "", () => {

});

app.rule (Method.ANY, "", () => {

});

Method.GET actually stands for Method.ONLY_GET | Method.HEAD so that it can also capture HEAD requests. It’s pretty handy, but I still need to figure out how to strip the produced body.

VSGI Loader

More details here: #130.

Loading of application described as a dynamic module (see GModule for more details) will be brought by a small utility named vsgi. It will be able to spawn instances of the application using a VSGI implementation.

The application has to be written in a specific manner and provide at least one entry point:

public HandlerCallback app;

[CCode (cname = "g_module_check_init")]
public check_init () {
    var _app = new Router ();

    ...
    app = _app.handle;
}

[CCode (cname = "g_module_unload")]
public void unload () {
    app = null;
}
vsgi --directory=build --server=scgi app:app

All VSGI implementations are loadable and compatible with GLib.TypeModule.

The application is automatically reloaded on SIGHUP and it should be possible to implement live reloading with GLib.FileMonitor to facilitate the development as well as integration of Ivy to beautify the stack trace.

Multipart I/O Streams

The multipart stream is essential for any web application that would let clients submit files.

The implementation will be compatible with Soup.MultipartInputStream.

app.post ("", (req, res) => {
   var multipart_body = new MultipartInputStream (req.headers, req.body);

   InputStream? part;
   MessageHeaders part_headers;
   while (part = multipart_body.next_part (out part_headers) != null) {
      if (part_headers.get_content_disposition ())
   }
});

Typed Rule Parameters

The use of GType to interpret and convert rule parameters is essential for an optimal integration with GLib.

The idea is to declare types on the Router and attempt a conversion before pushing the parameter on the context.

app.register_type ("int", /\w+/, typeof (int));

app.get ("<int:i>", (req, res, next, ctx) => {
    var i = ctx["i"].get_int ();
});

Type conversion can be registered with GLib.Value.register_transform_func:

Value.register_transform_func (typeof (string),
                               typeof (int),
                               (src, ref dest) => {
    dest = (Value) int.parse ((string) src);
});

One useful approach would be to reverse that process to generate URLs given a rule.

Posted on .

Quick Update in Valum

I couldn’t touch the framework much these last days due to my busy schedule, so I just wanted to write a few words.

I like the approach used by Express.js to branch in the routing by providing a forward callback and call it if some conditions are met.

It is used for content negociation and works quite nicely.

app.get ("", accept ("text/html", (req, res, next) => {
    // user agent understands 'text/html'

    // well, finally, it's not available
    next (req, res);
}));

app.get ("", (req, res) => {
    // user agent wants something else
});

Other negociator are provided for the charset, encoding and much more. All the wildcards defined in the HTTP/1.1 specification are understood.

The code for static resource delivery is almost ready. I am finishing some tests and it should be merged.

It supports the production of the following headers (with flags):

  • ETag
  • Last-Modified
  • Cache-Control: public

And can deliver resources from a GResource bundle or a GFile path path. This also means that any GVFS backends are supported.

If the resource is not found, next is invoked to dispatch the request to the next handler.

One last thing is GSequence, which store a sorted sequence in a binary tree. I think that if we can sort Route objects in some way, this could provide a really effective routing in logarithmic time.

Or using a Trie

Posted on .

Sixteenth Week Update in Valum

This last weekly update marks the final release of valum-0.2 and a couple of things happened since the last beta release:

  • code and documentation improvements
  • handle all status codes properly by using the message as a payload
  • favour read_all over read and write_all over write for stream operations
  • all and methods now return the array of created Route objects
  • move cookies-related utilities back to VSGI
  • sign and verify for cryptographically secure cookies
  • filters and converters for Request and Response

I decided to move the cookies-related utilities back into VSGI, considering that VSGI provide a layer over libsoup-2.4 and cookies utilities are simply adapting to the Request and Response objects.

I introduced sign and verify to perform cookie signature and verification using HMAC.

using Soup;
using VSGI;

var cookie = new Cookie ("name", "value", ...);

cookie.@value = Cookies.sign (cookie, ChecksumType.SHA512, secret);

string @value;
if (Cookies.verify (cookie.@value, ChecksumType.SHA512, secret, out @value)) {
    // cookie is authentic and value is stored in @value
}

The signing process uses a HMAC signature over the name and value of the cookie to guarantee that we have produced the value and associated it with the name.

The signature is computed as follow:

HMAC (algorithm, secret, HMAC (algorithm, secret, value) + name) + value

Where

  • the algorithm is chosen from the GChecksumType enumeration
  • the secret is chosen
  • the name and value are from the cookie

The documentation has been updated with the latest changes and some parts were rewritten for better readability.

Filters and converters

Filters and converters are basis to create filters for Request and Response objects. They allow a handling middleware to apply composition on these objects to change their typical behaviour.

Within Valum, it is integrated by passing the Request and Response object to the NextCallback.

app.get ("", (req, res, next) => {
    next (req, new BufferedResponse (res));
}).then ((req, res) => {
    // all operations on res are buffered, data are sent when the
    // stream gets flushed
    res.write_all ("Hello world!".data, null);
});

These are just a beginning and the future releases will introduce a wide range of filters to create flexible pipelines.

Work on Mirdesign testcases

I have been actively working on Mirdesign testcases and finished its API specification.

  • final API specification
  • poll for status update
  • grammar productions and tokens implementation in JavaScript to generate code from an AST

The work on grammar productions and tokens in JavaScript will eventually lead to a compliant implementation of Mirdesign which will be useful if we decide to go further with the project. The possible outcome would be to provide all the capabilities of the language in an accessible manner to people in the field.

To easily integrate dependencies, Browserify is used to bundle relevant npm packages.

  • store.js to store data in localStorage with multiple fallbacks
  • codemirror to let user submit its own design
  • lex to lex a Mirdesign input
  • numeral to generate well formatted number according to Mirdesign EBNF
  • fasta-parser to parse a fasta input

I updated the build system to include the JS compilation with Google Closure Compiler and generation of API documentation with Swagger in the parallel build. I first thought using another build system specialized in compiling front-end applications, but waf is alread well adapted for our needs.

bld(
    rule   = 'browserify ${SRC} | closure-compiler --js_output_file ${TGT} -',
    target = 'mirdesign.min.js',
    source = 'mirdesign.js')

Google Closure Compiler performs static type checking, minification and generation of highly optimized code. It was essential to ensure type safety for the use of productions and tokens.

JSDocs is used to produce the documentation for the productions, tokens as well as the code backing the user interface.

I decoupled the processing function from the ThreadPool as we will eventually target a cluster using TORQUE to perform computation.

Long-term features:

  • API key to control resource usage
  • monitor CPU usage per user
  • theme for the user interface

Posted on .

Fourteenth Week Update in Valum

The 0.2.0-beta has been released with multiple improvements and features that were described in the preceeding update. It can be downloaded from GitHub as usual or installed from the Docker image.

The documentaion has been nicely improved with more contextual notes to put emphasis on important points.

The framework has reached a really good level of stability and I should promptly release a stable version in the coming week.

There’s a couple of features I think that could be worth in the stable release:

  • listen to multiple sources (socket, file descriptor, )
  • listen to an arbitrary socket using a descriptive URL

I have implemented a lookup function for cookies which finds a cookie in the request headers by its name.

var cookie = Cookies.lookup ("session", req.headers);

Mirdesign

I started working more seriously on the side project as I could meet up with Nicolas Scott to discuss what kind of web applications will be developed with Valum.

Mirdesign HIV prototype built with Semantic UI.

But first, let me briefly introduce you to his work. He works on µRNA simulations using a modified version of an algorithm that performs matches between two sets: µRNA and genes (messaging RNA) from a cell line.

He developed a language that let one efficiently describe and execute simulations. It does not have a name, but the whole thing is named Mirdesign, “Mir” standing for µRNA.

The web application will become a showcase for his work by providing specific testcases his language can actually describe. It consists of two layers:

  • an API written with Valum and backed by a worker pool, memcached, MySQL and JSON documented here with Swagger
  • a client written with Semantic UI that consumes the API

As of now, we decided to go on with a HIV testcase that would let one select a cell line, the amount of µRNA to pour and some extra genes that could be specified or extracted from a FASTA file.

If it works well, other testcases will be implemented to cover yet unexplored aspects of Mirdesign.

There’s still a couple of things to work out:

  • parsing the FASTA file (fasta will be used)
  • generating a Mirdesign word from user input
  • exposing (partially) the MySQL database through the web API
  • change the processing backend (TORQUE or other cluster)

Posted on and tagged with Mirdesign and Valum.

Twelvth week update (from 29/06/15 to 17/07/15) in Valum

I have been very busy in the last weeks so this update will cover the work of three weeks instead of a typical bi-weekly update.

There’s no release announcement as I have been working on the assignment I have to realize with the framework and steadily worked toward the 0.2.0-beta release.

Alongside, I have been working on feature for the 0.3 serie which will introduce middlewares. I have prototyped the following:

  • HTTP authentication (basic and digest)
  • content negociation
  • static resources

I have also introduced then in Route, which is a really handy feature to create handling sequences and implemented the trailer from the chunked encoding.

Update from Colomban!

I had an unexpected update from the developer of CTPL, Colomban Wendling. We have talked a few weeks ago about the possibilities of having GObject Introspection into the library so that we could generate decent bindings for Vala.

He’s got something working and I will try to keep a good eye on the work so that we can eventually ship a better binding for the templating engine.

CTPL is a good short-term solution for templating and if the library evolves and integrates new features, it could possibly be a replacement for a possible Mustache implementation.

The two big issues with CTPL is the lack of basic features:

  • filters
  • mapping
  • array of array

Filters let one attach a function to the environment so that it can be applied on the variables instead of pre-processing the data.

Mappings could be easily implemented if Ctpl.Environ would be allowed to contain themselves.

Containers are limited to hold scalars of the same type, which is quite restrictive and prevents many usages.

Working prototype

I have a working prototype for the assignment that I will briefly describe here.

In order to expose the algorithm developed by Nicolas Scott for his Ph. D thesis, I decided to describe a RESTful API with the following endpoints:

PUT /task
GET /task/{uuid}
DELETE /task/{uuid}
GET /task/{uuid}/results
GET /tasks
GET /statistics

The task is still a very generic concept as I do not know much about what kind of data will be poured into the program.

  1. client submits a task with its data
  2. the task is created (stored in memcached) and then queued in a ThreadPool
    • the pool eventually process the task in a worker thread
  3. client requests the results of the task and either one of the following scenario occurs:
    • the task is queued or processing and a 4xx is thrown
    • the task is completed and the result is transmitted

I have finished bindings for libmemcachedutil, which provides a connection pool which has roughly doubled the throughput.

There’s still a few things to do:

  • GLib.MainLoop integration for libmemcached bindings to let the loop schedule request processing and memcached operations
  • client based on Semantic UI (in progress…)

Semantic UI has nice API mapping capabilities that will be very useful to present the data interactively.

The project will be containerized and shipped probably on Google Compute Engine as it supports Docker.

Then…

This feature is really handy as it is common to reuse the matching process for a sequence of handling callbacks. I will be introduced in the 0.3 branch as it will work nicely along middlewares.

app.get ("", (req, res, next) => {
    // step 1
}).then ((req, res, next) => {
    // step 2
}).then ((req, res) => {
    // step 3
});

Posted on .