This practice is a specialization of the Java Analysis, Visualization & Generation Practice for generation of JUnit tests. In particular:
The above diagram shows Java development activities and artifacts. Black arrows show the typical process, blue arrows show the test generation loop.
The developer produces source artifacts which may include non-java artifacts used to generate Java code (e.g. Ecore models), “main” Java sources and test Java sources. Java sources are compiled into bytecode (class files). Here it is important to note that matching of bytecode classes and methods to source code classes and methods might be non-trivial because of:
JUnit tests are compiled and executed. If code coverage, such as jacoco, is configured then test execution produces coverage data. Jacoco stores coverage data in jacoco.exec
file. This file is used to generate a coverage report and upload coverage information to systems like SonarQube. In this practice it is also used to select which methods to generate tests for based on coverage data.
This diagram provides an insight into the test generation activity:
The following section provides an overview of two “local loop” reference implementations (a.k.a. designs/embodiments) - all-in-one and componentized. There are many possible designs leveraging different alternatives at multiple variation points. The sections after the reference implementations section provide an overview of variation points, alternatives, and factors to take into consideration during alternative selection.
Nasdanika CLI features JUnit command which generates JUnit tests as explained above.
This section explains reference implementations
All-in-one generations is implemented as a JUnit test is available in TestGenerator. An example of tests generated by this generator - PetControllerTests.
As the name implies, all steps of source analysis and generation are implemented in a single class and are executed in one go.
Componentized test generation which is also executed in one go is implemented in these classes:
TestGitLab demonstrates how to scan a source repository (GitLab) using REST API, inspect code, generate unit tests, commit them to the server (also over the REST API) and create a merge request. This implementation does not use coverage information, its purpose is to demonstrate operation over the REST API without having to clone a repository, which might be an expensive operation. The implementation uses GitLab Model to communicate with the repository. It uses Java model to load sources and StringBuilder to build test cases.
As you have seen above, you can have an AI-powered JUnit test generator in about 230 lines of code, and maybe it would all you need. However, there are many variation points (design dimensions), alternatives at each point and, as such, possible permutations of thereof (designs). This section provides a high level overview of variation points and alternatives. How to assemble a solution from those alternative is specific to your context and there might be different solutions for different contexts and multiple solutions complementing each other. As you proceed with assembling a solution, or a portfolio of solutions, you may identify more variation points and alternatives. To manage the complexity you may use:
In this section we’ll use the below diagram and the concept of an Enterprise with Stakeholders performing activities and exchanging Messages over Channels.
The mission of our enterprise is to deliver quality Java code. The loss function to minimize is loss function = cost * risk / business value
. For our purposes we’ll define risk as inversely proportional to tests coverage risk = missed lines / total lines
- that’s all we can measure in this simple model. The cost includes resources costs - salary, usage fees for OpenAI.
Below is a summary of our enterprise:
The below sections outline variation points and alternatives for the list items above
A developer writes code - both “business” and test. They use some kind of an editor, likely an IDE - Eclipse, IntelliJ, VS Code. Different IDE’s come with different sets of plug-ins, including AI assistants. Forcing a developer to switch from their IDE of preference to another IDE is likely to cause considerable productivity drop, at least for some period of time, even if the new IDE is considered superior to the old IDE. So, if you want to switch to another IDE just because it has some plug-in which you like - think twice.
A build machine compiles code and executes tests. Technically, compilation and test execution may be separated in two individual activities. We are not doing it for this analysis because it doesn’t carry much relevance to test generation. You can do it for yours.
Test generator generates tests by “looking” at the source code, bytecode, and code coverage results.
Because the source code is a model element representing piece of code (method, constructor, …), the generator may traverse the model to “understand” the context. E.g. it may take a look at the method’s class, other classes in the module. If the sources are loaded from a version control system, it may take a look at the commits. And if the source model is part of an organization model, it may look at “sibling” modules and other resources.
By analyzing source and bytecode the generator would know methods a given method calls, objects it creates, and also it would know methods calling the method. It will also “know” branch conditions, e.g. switch cases. Using this information the generator may:
The test generator may do the following code generated by GenAI:
In addition to code generation the generator may ask GenAI to explain code and generate recommendations - it will help the developer to understand the source method and possibly improve it along the way. It may also generate dependency graphs and sequence diagrams.
There may GenAI models out there - cloud, self hosted. Which one to use heavily depends on the context. For example, if you have a large codebase with considerable amount of technical debt having an on-prem model may be a good choice because:
In this scenario your cost is on-prem infrastructure and power. Your savings are not having to pay for GenAI in the cloud and developer productivity if your fined tuned model turns out to be more efficient than a “vanilla” LLM.
There are many other considerations, of course!
In this section we’ll take a look just at bytecode and coverage results delivered to the test generator. The generator operates on models. As such, bytecode and coverage results can be delivered in a “raw” format to be loaded to a model by the generator, or pre-loaded to a model and saved to a file. The second option results in fewer files to pass to the test generator. The model file can be in XMI format or in compressed binary. The XMI format is human-readable, the binary format takes less space on disk.
For local development the build machine is the same machine where developer creates sources. The test generator is also executed on the developer’s workstation. As such, the delivery channels is the file system.
In the case of CI/CD pipeline/build server such as Jenkins or GitHub Actions, a version control systems is the delivery channel.
The test generator needs coverage results. If the coverage results are delivered in the raw form, it also needs bytecode (class files) to make sense of the results.
Coverage results can be delivered to the test generator using the following channels:
The goal is to deliver generated tests to the developer, make the developer aware that they are available, and possibly track progress of incorporating the generated tests into the test suite. With this in mind, there are the following alternatives/options:
@Disabled
annotation so they are clearly visible in the test execution tree, and with @Generated
annotation to track changes and merge generated and hand-crafted code.Issue trackers and messaging systems may be used to deliver generated documentation while source control will deliver generated tests. Developers will use the generated documentation such as graphs, sequence diagrams and GenAI explanations/recommendations in conjunction with the generated test code.
This channel may implement some sort of backpressure by saying “it is enough for now”, as a human developer would by crying “Enough is enough, I have other stories to work on in this sprint!”. Generating just enough tests is beneficial in the following ways:
With backpressure a question of prioritization/sorting arises - what to work on first? Source methods can be sorted according to:
One strategy might be to work on callee methods first (method a) to provide a solid foundation. Another is to work on caller methods first because callee methods might be tested along the way.
These strategies might be combined - some developers (say junior) may work on callee tests and senior developers may be assigned to test (complex) caller (top level) methods. Also, the top-down approach (callers first) might be better for addressing technical debt accrued over time, while bottom-up (callees first) for new development.
GenAI is neither free nor blazing fast. As such, this channel may implement: