Some rough numbers on WebKit code

My wife asked me for some rough LOC numbers on the WebKit project and I think I could share them with you here as well. They come from r221232. As I’ll take into account some generated code it is relevant to mention that I built WebKitGTK+ with the default CMake options.

First thing I did was running sloccount Source and got the following numbers:

cpp: 2526061 (70.57%)
ansic: 396906 (11.09%)
asm: 207284 (5.79%)
javascript: 175059 (4.89%)
java: 74458 (2.08%)
perl: 73331 (2.05%)
objc: 44422 (1.24%)
python: 38862 (1.09%)
cs: 13011 (0.36%)
ruby: 11605 (0.32%)
xml: 11396 (0.32%)
sh: 3747 (0.10%)
yacc: 2167 (0.06%)
lex: 1007 (0.03%)
lisp: 89 (0.00%)
php: 10 (0.00%)

This number do not include IDL code so I did some grepping to get the number myself that gave me 19632 IDL lines:

$ find Source/ -name ".idl" | xargs cat | grep -ve "^[[:space:]]\/*" -ve "^[[:space:]]*" -ve "^[[:space:]]$" -ve "^[[:space:]][$" -ve "^[[:space:]]};$" | wc -l
19632

The interesting part of the IDL files is that they are used to generate code so those 19632 IDL lines expand to:

ansic: 699140 (65.25%)
cpp: 368720 (34.41%)
python: 1492 (0.14%)
xml: 1040 (0.10%)
javascript: 883 (0.08%)
asm: 169 (0.02%)
perl: 11 (0.00%)

Let’s have a look now at the LayoutTests (they test the functionality of WebCore + the platform). Tests are composed mainly by HTML files so if you run sloccount LayoutTests you get:

javascript: 401159 (76.74%)
python: 87231 (16.69%)
xml: 22978 (4.40%)
php: 4784 (0.92%)
ansic: 3661 (0.70%)
perl: 2726 (0.52%)
sh: 199 (0.04%)

It’s quite interesting to see that sloccount does not consider HTML which is quite relevant when you’re testing a web engine so again, we have to count them manually (thanks to Carlos López who helped me to properly grep here as some binary lines were giving me a headache to get the numbers):

find LayoutTests/ -name ".html" -print0 | xargs -0 cat | strings | grep -Pv "^[[:space:]]$" | wc -l
2205690

You can see 2205690 of “meaningful lines” that combine HTML + other languages that you can see above. I can’t substract here to just get the HTML lines because the number above take into account files with a different extension than HTML, though many of them do include other languages, specially JavaScript.

But the LayoutTests do not include only pure WebKit tests. There are some imported ones so it might be interesting to run the same procedure under LayoutTests/imported to see which ones are imported and not written directly into the WebKit project. I emphasize that because they can be written by WebKit developers in other repositories and actually I can present myself and Youenn Fablet as an example as we wrote tests some tests that were finally moved into the specification and included back later when imported. So again, sloccount LayoutTests/imported:

python: 84803 (59.99%)
javascript: 51794 (36.64%)
ansic: 3661 (2.59%)
php: 575 (0.41%)
xml: 250 (0.18%)
sh: 199 (0.14%)
perl: 86 (0.06%)

The same procedure to count HTML + other stuff lines inside that directory gives a number of 295490:

$ find LayoutTests/imported/ -name ".html" -print0 | xargs -0 cat | strings | grep -Pv "^[[:space:]]$" | wc -l
295490

There are also some other tests that we can talk about, for example the JSTests. I’ll mention already the numbers summed up regarding languages and the manual HTML code (if you made it here, you know the drill already):

javascript: 1713200 (98.64%)
xml: 20665 (1.19%)
perl: 2449 (0.14%)
python: 421 (0.02%)
ruby: 56 (0.00%)
sh: 38 (0.00%)
HTML+stuff: 997

ManualTests:

javascript: 297 (41.02%)
ansic: 187 (25.83%)
java: 118 (16.30%)
xml: 103 (14.23%)
php: 10 (1.38%)
perl: 9 (1.24%)
HTML+stuff: 16026

PerformanceTests:

javascript: 950916 (83.12%)
cpp: 147194 (12.87%)
ansic: 38540 (3.37%)
asm: 5466 (0.48%)
sh: 872 (0.08%)
ruby: 419 (0.04%)
perl: 348 (0.03%)
python: 325 (0.03%)
xml: 5 (0.00%)
HTML+stuff: 238002

TestsWebKitAPI:

cpp: 44753 (99.45%)
ansic: 163 (0.36%)
objc: 76 (0.17%)
xml: 7 (0.02%)
javascript: 1 (0.00%)
HTML+stuff: 3887

And this is all. Remember that these are just some rough statistics, not a “scientific” paper.

Update:

In her expert opinion, in the WebKit project we are devoting around 50% of the total LOC to testing, which makes it a software engineering “textbook” project regarding testing and I think we can be proud of it!