Java Source Code Contribution

Attached, "sourcecount.py.txt," is the Python 3.0 script to generate the mapping between authors and java source code files.

Attached, "code_contribution.ods," is the OpenOffice 2.0 Calc file to generate the following statistics from the data generated by the above Python script.

To reproduce the following statistics for other versions of Liferay, change the path variable in the Python script to the path to the root directory of Liferay x.y, run the script.

The script generates an output file. Replace the last two characters of each line in the output file with " because there is a non-printable character that I'm not sure how to get rid of.

In order to generate the statistics, first get the list of authors. To get the list of authors, paste the output file's data into Columns A and B for filename and author, respectively. Sort the data in ascending order by column B's data. Then, in column C row 1 type "=IF(B1=B2;1;0)". This will be 1 everywhere except when the author's name changes. Sort the data in ascending order by column C's data. Now Column C can be used to find the unique author names in Column B. Copy and paste the author names across the columns. In the cell below the author's name, type "=IF(D1=B1;1;0)". Then drag the black anchor on the bottom right of the cell to all the way down so as to cover all the rows in the data. Finally, select Coulmn D and press and drag the black anchor on the bottom right of the cell all the way to the right so as to cover all the columns in the data.

Liferay 5.1.2

Authors Credits Percentage of Total
Brian Wing Shun Chan 5722 79.2960%
Jorge Ferrer 269 3.7278%
Raymond Augé 227 3.1458%
Bruno Farache 152 2.1064%
Alexander Chow 150 2.0787%
Prakash Reddy 65 0.9008%
Michael Young 62 0.8592%
Charles May 59 0.8176%
Michael C. Han 56 0.7761%
Ivica Cardic 44 0.6098%
Scott Lee 35 0.4850%
Brian Myunghun Kim 28 0.3880%
Harry Mark 24 0.3326%
Ganesh Ram 21 0.2910%
Sandeep Soni 21 0.2910%
Neil Griffin 20 0.2772%
Deepak Gothe 20 0.2772%
Julio Camarero 19 0.2633%
Thiago Moreira 17 0.2356%
Karthik Sudarshan 17 0.2356%
Olaf Fricke 14 0.1940%
Joel Kozikowski 13 0.1802%
Samuel Kong 11 0.1524%
Eduardo Lundgren 10 0.1386%
Minhchau Dang 8 0.1109%
Brett Randall 7 0.0970%
Jonathan Lennox 6 0.0831%
Alvaro del Castillo 6 0.0831%
Joshna Reddy 6 0.0831%
Jon Steer 4 0.0554%
Zongliang Li 4 0.0554%
Wilson S. Man 4 0.0554%
Ming-Gih Lam 4 0.0554%
Mika Koivisto 4 0.0554%
Allen Chiang 4 0.0554%
Mirco Tamburini 3 0.0416%
Clarence Shen 3 0.0416%
Joseph Shum 3 0.0416%
Prashant Dighe 3 0.0416%
Raju Uppalapati 3 0.0416%
Alberto Montero 3 0.0416%
David Truong 3 0.0416%
Mathias Bogaert 2 0.0277%
Brian Chan 2 0.0277%
Berentey Zsolt 2 0.0277%
Patrick Brady 2 0.0277%
Sten Martinez 2 0.0277%
Tariq Dweik 2 0.0277%
Alan Zimmerman 2 0.0277%
Alex Wallace 2 0.0277%
Shepherd Ching 2 0.0277%
Shuyang Zhou 2 0.0277%
Jesper Weissglas 2 0.0277%
Glenn Powell 2 0.0277%
Jerry Niu 2 0.0277%
James Lefeu 2 0.0277%
Manish Gupta 2 0.0277%
Gavin Wan 2 0.0277%
Jayson Falkner 2 0.0277%
Alex Chow 1 0.0139%
Sergey Ponomarev 1 0.0139%
Alysa Carver 1 0.0139%
Hervé Ménage 1 0.0139%
Santi Kumar 1 0.0139%
Toma Bedolla 1 0.0139%
Wilson Man 1 0.0139%
Javier de Ros 1 0.0139%
James Schopp 1 0.0139%
Jian Cao 1 0.0139%
Steven P. Goldsmith 1 0.0139%
Tang Ying Jian 1 0.0139%
Keith R. Davis 1 0.0139%
Felix Ventero 1 0.0139%
Atul Patel 1 0.0139%
Josiah Goh 1 0.0139%
Marcus Schmidke 1 0.0139%
Michael Lawrence 1 0.0139%
Michael Weisser 1 0.0139%
Britt Courtney 1 0.0139%
Jose Oliver 1 0.0139%
Amos Fong 1 0.0139%
Richard Beatty 1 0.0139%
Rudy Hilado 1 0.0139%
Nate Cavanaugh 1 0.0139%
Arcko Yongming Duan 1 0.0139%
Araceli Checa 1 0.0139%
Andrius Vitkauskas 1 0.0139%
Total 7216 100.0000%

2 Anexos
29372 Visualizações
Média (0 Votos)
A média da avaliação é 0.0 estrelas de 5.
Comentários
Respostas do tópico Autor Data
This stats are not very accurate because Brian... Jorge Ferrer 20 de Abril de 2009 00:11
Thanks Jorge for the insight into the... None None 20 de Abril de 2009 14:46

This stats are not very accurate because Brian Chan is also the head of the QA team and reviews all the code making formatting changes very often. Also Service Builder puts his name in the generated classes by default.
Postado em 20/04/09 00:11.
Thanks Jorge for the insight into the statistics!

These stats are only as good as the javadoc comments in the Java source files.

Also, thanks for bringing into light the QA team.

I will work on giving partial credit to each author of multi-author documents.
Postado em 20/04/09 14:46 em resposta a Jorge Ferrer.