Some rough numbers on WebKit code

My wife asked me for some rough LOC numbers on the WebKit project and I think I could share them with you here as well. They come from r221232. As I’ll take into account some generated code it is relevant to mention that I built WebKitGTK+ with the default CMake options.

First thing I did was running sloccount Source and got the following numbers:

cpp: 2526061 (70.57%)
ansic: 396906 (11.09%)
asm: 207284 (5.79%)
javascript: 175059 (4.89%)
java: 74458 (2.08%)
perl: 73331 (2.05%)
objc: 44422 (1.24%)
python: 38862 (1.09%)
cs: 13011 (0.36%)
ruby: 11605 (0.32%)
xml: 11396 (0.32%)
sh: 3747 (0.10%)
yacc: 2167 (0.06%)
lex: 1007 (0.03%)
lisp: 89 (0.00%)
php: 10 (0.00%)

This number do not include IDL code so I did some grepping to get the number myself that gave me 19632 IDL lines:

$ find Source/ -name ".idl" | xargs cat | grep -ve "^[[:space:]]\/*" -ve "^[[:space:]]*" -ve "^[[:space:]]$" -ve "^[[:space:]][$" -ve "^[[:space:]]};$" | wc -l
19632

The interesting part of the IDL files is that they are used to generate code so those 19632 IDL lines expand to:

ansic: 699140 (65.25%)
cpp: 368720 (34.41%)
python: 1492 (0.14%)
xml: 1040 (0.10%)
javascript: 883 (0.08%)
asm: 169 (0.02%)
perl: 11 (0.00%)

Let’s have a look now at the LayoutTests (they test the functionality of WebCore + the platform). Tests are composed mainly by HTML files so if you run sloccount LayoutTests you get:

javascript: 401159 (76.74%)
python: 87231 (16.69%)
xml: 22978 (4.40%)
php: 4784 (0.92%)
ansic: 3661 (0.70%)
perl: 2726 (0.52%)
sh: 199 (0.04%)

It’s quite interesting to see that sloccount does not consider HTML which is quite relevant when you’re testing a web engine so again, we have to count them manually (thanks to Carlos López who helped me to properly grep here as some binary lines were giving me a headache to get the numbers):

find LayoutTests/ -name ".html" -print0 | xargs -0 cat | strings | grep -Pv "^[[:space:]]$" | wc -l
2205690

You can see 2205690 of “meaningful lines” that combine HTML + other languages that you can see above. I can’t substract here to just get the HTML lines because the number above take into account files with a different extension than HTML, though many of them do include other languages, specially JavaScript.

But the LayoutTests do not include only pure WebKit tests. There are some imported ones so it might be interesting to run the same procedure under LayoutTests/imported to see which ones are imported and not written directly into the WebKit project. I emphasize that because they can be written by WebKit developers in other repositories and actually I can present myself and Youenn Fablet as an example as we wrote tests some tests that were finally moved into the specification and included back later when imported. So again, sloccount LayoutTests/imported:

python: 84803 (59.99%)
javascript: 51794 (36.64%)
ansic: 3661 (2.59%)
php: 575 (0.41%)
xml: 250 (0.18%)
sh: 199 (0.14%)
perl: 86 (0.06%)

The same procedure to count HTML + other stuff lines inside that directory gives a number of 295490:

$ find LayoutTests/imported/ -name ".html" -print0 | xargs -0 cat | strings | grep -Pv "^[[:space:]]$" | wc -l
295490

There are also some other tests that we can talk about, for example the JSTests. I’ll mention already the numbers summed up regarding languages and the manual HTML code (if you made it here, you know the drill already):

javascript: 1713200 (98.64%)
xml: 20665 (1.19%)
perl: 2449 (0.14%)
python: 421 (0.02%)
ruby: 56 (0.00%)
sh: 38 (0.00%)
HTML+stuff: 997

ManualTests:

javascript: 297 (41.02%)
ansic: 187 (25.83%)
java: 118 (16.30%)
xml: 103 (14.23%)
php: 10 (1.38%)
perl: 9 (1.24%)
HTML+stuff: 16026

PerformanceTests:

javascript: 950916 (83.12%)
cpp: 147194 (12.87%)
ansic: 38540 (3.37%)
asm: 5466 (0.48%)
sh: 872 (0.08%)
ruby: 419 (0.04%)
perl: 348 (0.03%)
python: 325 (0.03%)
xml: 5 (0.00%)
HTML+stuff: 238002

TestsWebKitAPI:

cpp: 44753 (99.45%)
ansic: 163 (0.36%)
objc: 76 (0.17%)
xml: 7 (0.02%)
javascript: 1 (0.00%)
HTML+stuff: 3887

And this is all. Remember that these are just some rough statistics, not a “scientific” paper.

Update:

In her expert opinion, in the WebKit project we are devoting around 50% of the total LOC to testing, which makes it a software engineering “textbook” project regarding testing and I think we can be proud of it!

Being taken to Azkaban for use of very dark GStreamer magic

I was writing some tests for a project at Igalia and I need to mock the convert-frame playbin2 element action. The code to invoke it is something like this:

GstElement *pipeline = /* get pipeline */;
GstCaps *caps = /* create caps to adapt the conversion */;
GstBuffer *buffer = NULL;
g_signal_emit_by_name (pipeline, "convert-frame", caps, &buffer);

When you are writing tests, what you want to do is testing just your code and not to depend on something external, so in this case the idea would be providing a fake implementation for that GStreamer element action.

The way you can do this kind of things is providing the symbol in your code so that the linker when doing its job does not look any further and uses that instead of the one in the external library, so the natural solution coming to your mind would be rewriting g_signal_emit_by_name. The problem with this is that though you are not using it in your code, it is too general, so it is not a good idea.

I thought I could replace the convert-frame action in the playbin2 class, so I wrote this code:

typedef struct
{
  GstPipelineClass parent_class;
  void (*about_to_finish) (gpointer playbin);
  void (*video_changed) (gpointer playbin);
  void (*audio_changed) (gpointer playbin);
  void (*text_changed) (gpointer playbin);
  void (*video_tags_changed) (gpointer playbin, gint stream);
  void (*audio_tags_changed) (gpointer playbin, gint stream);
  void (*text_tags_changed) (gpointer playbin, gint stream);
  GstTagList *(*get_video_tags) (gpointer playbin, gint stream);
  GstTagList *(*get_audio_tags) (gpointer playbin, gint stream);
  GstTagList *(*get_text_tags) (gpointer playbin, gint stream);
  GstBuffer *(*convert_frame) (gpointer playbin, GstCaps * caps);
  GstPad *(*get_video_pad) (gpointer playbin, gint stream);
  GstPad *(*get_audio_pad) (gpointer playbin, gint stream);
  GstPad *(*get_text_pad) (gpointer playbin, gint stream);
} GstPlayBinClass;

static gpointer
gst_play_bin_convert_frame (G_GNUC_UNUSED gpointer playbin,
                            G_GNUC_UNUSED gpointer caps)
{
    GstBuffer *buffer;

    /* Create my own GstBuffer with the data I need */

    return buffer;
}

void
simulator_gst_reset(GstElement *new_pipeline, GstBus *new_bus)
{
    /* ... */

    GstPlayBinClass *klass =
        G_TYPE_INSTANCE_GET_CLASS(new_pipeline, GST_PLAY_BIN_TYPE,
                                  GstPlayBinClass);
    klass->convert_frame = (gpointer) gst_play_bin_convert_frame;

    /* ... */
}

First I declared the GstPlayBinClass copying it from the GStreamer code. I didn’t change any parameters order, just replaced some pointers with gpointer as we don’t need them. This way you don’t break the ABI. Then you can declare your own element action code and finally you get the Class, assign the method and voilà!.

As I said, the solution is far from being the best, but if you know a better way, drop me a comment.