======2011 GSoC LSB projects======

[[:gsoc:google-summer-code-2011|Go back to the main GSoC Linux Foundation page]]

Mailing list: [[http://lists.linux-foundation.org/mailman/listinfo/lsb-discuss|lsb-discuss at lists dot linux-foundation dot org]]

IRC: #lsb on irc.linux-foundation.org

[[:lsb:start|Workgroup Resources]]

Code License: GPL/BSD, specs: GNU FDL

Mentors: Alexey Khoroshilov (alexey dot khoroshilov at gmail dot com), Jeff Licquia, Vladimir Rubanov, Denis Silakov (dsilakov at gmail dot com).
===Consistency Checkers for the LSB Database===

LSB Database is the core item of the LSB Infrastructure - it serves as a centralized source of data about standardized interfaces as  well as data about Linux ecosystem. The database schema is relatively complex with more than 100 closely interconnected tables. And there are different constraints that should be met for information stored in the database to be consistent. Some of these restrictions and consistency requirements can be automatically checked by the means of DBMS (e.g., by triggers). Unfortunately, this is not always convinient. There can be consistency violations which are not crucial, so there is no need for everyone to stop the work and try to fix them (which can be non-trivial task). However, they should not be simply ignored, since they are still 'nice to fix'.

In order to detect and track such violations, we use 'consistency checkers' integrated in the LSB Navigator. A checker is usually a SQL request (or a sequence of SQL requests, like stored procedure) wrapped with PHP code to display results in Navigator. Currently there are 48 consistency checkers. However, the LSB DB evolves, and now we see the need for more checkers to be developed.\\ 
Another problem is that in many cases it is possible to automatically detect data inconsistency, but not to fix it. Nevertheless, in some situations it is possible to automatically generate a template for the fix, so developer will only have to adjust it a bit. The Navigator contains a prototype of such fix template generator for a couple of checkers, but in its current form it isn't very useful in case of large number of inconsistencies.\\ \\ 
The goals of this project are:

  * develop new consistency checkers for the LSB DB. Restrictions to be checked will be provided (probably in quite informal form) by LSB maintainers. The student is welcome to propose his own ideas for the checkers to be implemented, when he becomes familiar with the DB schema.
  * Improve (or maybe completely rewrite) generation of templates for inconsistency fixes. Improve consistency checkers (wherever possible) to use this possibility and provide users with fix templates.

Desired knowledge: PHP, MySQL

Mentor: Denis Silakov 
===Improve LSB DevChk===

[[https://www.linuxfoundation.org/en/DevChk|DevChk]] is a test suite used for development environment verification. It verifies different values in specific header files that might affect the ABI specified by LSB.

DevChk is not released as part of the LSB certification program and used for internal activities only - in particular, it allows to track updates in upstream header files and differences between the same header file in different systems. One of the most important purposes of such tracking is to keep header files in the LSB SDK consistent and aligned with the upstream.

We have been using DevChk for years and it proved to be very useful. However, there are several directions for it to be improved. Among others, improvements in the following areas would be useful:

  * DevChk is almost useless for C++ stuff. Though LSB DB doesn't contain much source-level information for C++, some checks still would be useful
  * there should be a way to 'wave' issues - that is, to say 'ok, we know about this discrepancy, please ignore it in future'
  * when DevChk detects an issue which indicates that something is wrong in the LSB Database (which stores information about standardized items and from which DevChk is partially generated), it tries to generate SQL query which will fix the issue. However, this feature is definitely not perfect at the moment and usually the fixes produced are insufficient, manual work is still required

Other improvements are welcome that would allow introducing more automation/intelligence and reduce amount of resources required to analyze DevChk output and fix detected issues. More issues/possible enchancements can be found in the roll-up bug for DevChk improvements in [[http://bugs.linuxbase.org/show_bug.cgi?id=3129|LSB Bugzilla]].

Desired knowledge: Perl, MySQL, C and C++

Mentor: Denis Silakov (dsilakov at gmail dot com).
===CUnit format support for API Sanity Autotest===

The [[http://ispras.linuxfoundation.org/index.php/API_Sanity_Autotest|API Sanity Autotest]] is an automatic generator of basic unit tests for C/C++ libraries. It can quickly create simple ("sanity" or "shallow"-quality) test cases for every library function based on the information about the function signatures, data type definitions and relationships between the functions automatically extracted from library header files. The resulting tests can be used as a "jump-start" basis for more thorough tests. The approach used by this framework proved to be especially useful for large libraries containing thousands of functions. In particular, it was used to develop LSB certification test suites for Qt3, Qt4 and libxml2 libraries.

One of the disadvantages of the framework is that it currently supports generation of output tests only in [[http://sourceforge.net/projects/template2code/|Template2Code]] (T2C) format. Adding support for more formats is a topical task that would allow extending the framework audience.

The goal of this project is to add support for test generation in format used by [[http://cunit.sourceforge.net/|CUnit]] - one of the most widely used frameworks for unit testing at the moment. An optional task for the project is to add support for [[http://sourceforge.net/projects/cppunit/|CppUnit]] test format. As a side effect, the project will allow estimating complexity of introduction of new test formats and identify possible improvements in the framework architecture.

Desired knowledge: C and C++, Perl, basic knowledge of unit testing frameworks, CUnit

Mentor: Vladimir Rubanov
===Adding support for interception of callback operations in kernel modules to KEDR Framework===

[[http://kedr.berlios.de/|KEDR framework]] is a system for dynamic analysis of Linux kernel modules (device drivers, file system modules, etc.). KEDR can operate on specific user-defined modules and can detect memory leaks, perform fault simulation and more.

KEDR is based on interception of function calls performed by modules under analysis. However, many important operations are performed by kernel modules via callback functions registered with appropriate kernel subsystems - file operations, inode operations, address space operations and so on. Currently KEDR is unable to track calls to such callbacks.

Ability to track callback invocation would allow analyzing and verifying behavior of kernel modules more thoroughly. Among other benefits, it will make KEDR memory leak detector more reliable and reduce the number of false positives reported by it.

The goal of this project is to enhance KEDR framework with support for interception of callback operations and to test this functionality on the real-world kernel modules.

Desired knowledge: C, experience with Linux kernel modules development

Mentor: Vladimir Rubanov
===Interpolating procedure for quantified formulas for the LDV Project===

[[http://linuxtesting.org/project/ldv|Linux Driver Verification (LDV)]] program is devoted to developing an integrated framework for device drivers verification and adopting leading edge research results to enhance quality of verification tools. Currently LDV tools are used to analyse a subset of drivers included in the Linux kernel. By the moment, the framework has helped to find [[http://linuxtesting.org/results/ldv|more than 30 issues]] in real-world device drivers.

Currently the LDV static verification tools contain several issues concerning analysis of code handling implicit memory locations (e.g., addressed without explicit variable name but through pointers or arrays). The issues can be resolved using special pointer predicates based on uninterpreted functions with a predefined set of axioms. These axioms can be //universally quantified//, as for example in a statement like this: “An address of a result of a pointer dereference is equal to the pointer value, which is true for any pointer”. ////

However, a component of LDV which generates interpolants does not support universally quantified formulas. At the same time some existing SMT solvers such as [[http://www.cs.nyu.edu/acsys/cvc3/|CVC3]], [[http://research.microsoft.com/en-us/um/redmond/projects/z3/|Z3]] or [[http://yices.csl.sri.com/|Yices]] can decide satisfiability of formulas with universal quantifiers. Unfortunately, they don't implement interpolation procedure.

This project is aimed at developing a tool for such quantified formulas interpolation. The tool can be based on the idea of instantiating axioms used during the proof of a formula unsatisfiability and then using some existing interpolation procedure. This can be achieved by modifying an open source [[http://www.cs.nyu.edu/acsys/cvc3/|CVC3 SMT solver]] (which already supports quantifiers) to produce instances of the axioms used in the proof.

Desired knowledge: C++, decision procedures, Craig interpolation tools, source code analysis.

Mentor: Alexey Khoroshilov
===User-friendly Interface for Linux Driver Verification Tools===

One of the major tasks of the Linux driver verification (LDV) project is to provide easy to use, user-friendly and customizable way to explore verification results for various drivers. The data to be handled is large, since we analyze significant amount of drivers in every version of Linux kernel. Moreover, every driver is analyzed using different techniques and tools and it is important to compare quality of different techniques.

Verification results are stored in a database. Manual requests to the database are not very convenient and some UI is required. Currently we have a simple Web application which is able to summarize results by (kernel version; correctness rule) pairs with possibility to get more details for every pair. Current LDV Web UI  provides support for some analytical tasks and is actively used to analyze issues reported by verification tools and discover bugs in drivers, to compare quality of different verification techniques and so on.

However, different categories of users want to analyze data in different ways. For example, verification tool developers need to see verification time and results shown by their tools, while analysts want to summarize verification results for particular driver. We refer to such different ways of data representation as //profiles//. The database schema provides everything necessary to explore data from different point of views; now we need to support data representation profiles in the Web UI.

The goal of this project is to develop next-generation LDV Web UI which would allow customizing data to be shown by choosing appropriate profiles. Users should be able to select by which fields data should be grouped, specify which details about verification results and tools should be displayed and so on. In addition, different filters should be available that would allow displaying data that meets certain conditions.

Due to large size of data to be handled, much attention should be paid to speed of data processing and page rendering.

Desired knowledge: PHP (experience with Zend Framework is a plus), SQL, JavaScript, HTML, CSS.

Mentor: Alexey Khoroshilov