Spring Batch - Reference Documentation

1.1. Background

While open source software projects and associated communities have focused greater attention on web-based and microservices-based architecture frameworks, there has been a notable lack of focus on reusable architecture frameworks to accommodate Java-based batch processing needs, despite continued needs to handle such processing within enterprise IT environments. The lack of a standard, reusable batch architecture has resulted in the proliferation of many one-off, in-house solutions developed within client enterprise IT functions.

SpringSource (now VMware) and Accenture collaborated to change this. Accenture’s hands-on industry and technical experience in implementing batch architectures, SpringSource’s depth of technical experience, and Spring’s proven programming model together made a natural and powerful partnership to create high-quality, market-relevant software aimed at filling an important gap in enterprise Java. Both companies worked with a number of clients who were solving similar problems by developing Spring-based batch architecture solutions. This input provided some useful additional detail and real-life constraints that helped to ensure the solution can be applied to the real-world problems posed by clients.

Accenture contributed previously proprietary batch processing architecture frameworks to the Spring Batch project, along with committer resources to drive support, enhancements, and the existing feature set. Accenture’s contribution was based upon decades of experience in building batch architectures with the last several generations of platforms: COBOL on mainframes, C++ on Unix, and, now, Java anywhere.

The collaborative effort between Accenture and SpringSource aimed to promote the standardization of software processing approaches, frameworks, and tools enterprise users can consistently use when creating batch applications. Companies and government agencies desiring to deliver standard, proven solutions to their enterprise IT environments can benefit from Spring Batch.

1.2. Usage Scenarios

A typical batch program generally:

Spring Batch automates this basic batch iteration, providing the capability to process similar transactions as a set, typically in an offline environment without any user interaction. Batch jobs are part of most IT projects, and Spring Batch is the only open source framework that provides a robust, enterprise-scale solution.

1.2.1. Business Scenarios

Spring Batch supports the following business scenarios:

Let batch developers use the Spring programming model: Concentrate on business logic and let the framework take care of the infrastructure.

Provide clear separation of concerns between the infrastructure, the batch execution environment, and the batch application.

Provide common, core execution services as interfaces that all projects can implement.

Provide simple and default implementations of the core execution interfaces that can be used “out of the box”.

Make it easy to configure, customize, and extend services, by using the Spring framework in all layers.

All existing core services should be easy to replace or extend, without any impact to the infrastructure layer.

Provide a simple deployment model, with the architecture JARs completely separate from the application, built by using Maven.

1.3. Spring Batch Architecture

Spring Batch is designed with extensibility and a diverse group of end users in mind. The following image shows the layered architecture that supports the extensibility and ease of use for end-user developers.

This layered architecture highlights three major high-level components: Application, Core, and Infrastructure. The application contains all batch jobs and custom code written by developers using Spring Batch. The Batch Core contains the core runtime classes necessary to launch and control a batch job. It includes implementations for JobLauncher , Job , and Step . Both Application and Core are built on top of a common infrastructure. This infrastructure contains common readers and writers and services (such as the RetryTemplate ), which are used both by application developers(readers and writers, such as ItemReader and ItemWriter ), and the core framework itself (retry, which is its own library).

1.3.1. General Batch Principles and Guidelines

The following key principles, guidelines, and general considerations should be considered when building a batch solution.

Remember that a batch architecture typically affects on-line architecture and vice versa. Design with both architectures and environments in mind by using common building blocks when possible.

Simplify as much as possible and avoid building complex logical structures in single batch applications.

Keep the processing and storage of data physically close together (in other words, keep your data where your processing occurs).

Minimize system resource use, especially I/O. Perform as many operations as possible in internal memory.

Review application I/O (analyze SQL statements) to ensure that unnecessary physical I/O is avoided. In particular, the following four common flaws need to be looked for:

Reading data for every transaction when the data could be read once and cached or kept in the working storage.

Rereading data for a transaction where the data was read earlier in the same transaction.

Causing unnecessary table or index scans.

Not specifying key values in the WHERE clause of an SQL statement.

Do not do things twice in a batch run. For instance, if you need data summarization for reporting purposes, you should (if possible) increment stored totals when data is being initially processed, so your reporting application does not have to reprocess the same data.

Allocate enough memory at the beginning of a batch application to avoid time-consuming reallocation during the process.

Always assume the worst with regard to data integrity. Insert adequate checks and record validation to maintain data integrity.

Implement checksums for internal validation where possible. For example, flat files should have a trailer record telling the total of records in the file and an aggregate of the key fields.

Plan and execute stress tests as early as possible in a production-like environment with realistic data volumes.

In large batch systems, backups can be challenging, especially if the system is running concurrent with online applications on a 24-7 basis. Database backups are typically well taken care of in online design, but file backups should be considered to be just as important. If the system depends on flat files, file backup procedures should not only be in place and documented but be regularly tested as well.

1.3.2. Batch Processing Strategies

To help design and implement batch systems, basic batch application building blocks and patterns should be provided to the designers and programmers in the form of sample structure charts and code shells. When starting to design a batch job, the business logic should be decomposed into a series of steps that can be implemented by using the following standard building blocks:

Conversion Applications: For each type of file supplied by or generated for an external system, a conversion application must be created to convert the transaction records supplied into a standard format required for processing. This type of batch application can partly or entirely consist of translation utility modules (see Basic Batch Services).

Validation Applications: A validation application ensures that all input and output records are correct and consistent. Validation is typically based on file headers and trailers, checksums and validation algorithms, and record-level cross-checks.

Extract Applications: An extract application reads a set of records from a database or input file, selects records based on predefined rules, and writes the records to an output file.

Extract/Update Applications: An extract/update applications reads records from a database or an input file and makes changes to a database or an output file, driven by the data found in each input record.

Processing and Updating Applications: A processing and updating application performs processing on input transactions from an extract or a validation application. The processing usually involves reading a database to obtain data required for processing, potentially updating the database and creating records for output processing.

Output/Format Applications: An output/format applications reads an input file, restructures data from this record according to a standard format, and produces an output file for printing or transmission to another program or system.

Additionally, a basic application shell should be provided for business logic that cannot be built by using the previously mentioned building blocks.

In addition to the main building blocks, each application may use one or more standard utility steps, such as:

Sort: A program that reads an input file and produces an output file where records have been re-sequenced according to a sort key field in the records. Sorts are usually performed by standard system utilities.

Split: A program that reads a single input file and writes each record to one of several output files based on a field value. Splits can be tailored or performed by parameter-driven standard system utilities.

Merge: A program that reads records from multiple input files and produces one output file with combined data from the input files. Merges can be tailored or performed by parameter-driven standard system utilities.

The foundation of any batch system is the processing strategy. Factors affecting the selection of the strategy include: estimated batch system volume, concurrency with online systems or with other batch systems, available batch windows. (Note that, with more enterprises wanting to be up and running 24x7, clear batch windows are disappearing).

Typical processing options for batch are (in increasing order of implementation complexity):

The remainder of this section discusses these processing options in more detail. Note that, as a rule of thumb, the commit and locking strategy adopted by batch processes depends on the type of processing performed and that the online locking strategy should also use the same principles. Therefore, the batch architecture cannot be simply an afterthought when designing an overall architecture.

The locking strategy can be to use only normal database locks or to implement an additional custom locking service in the architecture. The locking service would track database locking (for example, by storing the necessary information in a dedicated database table) and give or deny permissions to the application programs requesting a database operation. Retry logic could also be implemented by this architecture to avoid aborting a batch job in case of a lock situation.

1. Normal processing in a batch window For simple batch processes running in a separate batch window where the data being updated is not required by online users or other batch processes, concurrency is not an issue and a single commit can be done at the end of the batch run.

In most cases, a more robust approach is more appropriate. Keep in mind that batch systems have a tendency to grow as time goes by, both in terms of complexity and the data volumes they handle. If no locking strategy is in place and the system still relies on a single commit point, modifying the batch programs can be painful. Therefore, even with the simplest batch systems, consider the need for commit logic for restart-recovery options as well as the information concerning the more complex cases described later in this section.

2. Concurrent batch or on-line processing Batch applications processing data that can be simultaneously updated by online users should not lock any data (either in the database or in files) that could be required by on-line users for more than a few seconds. Also, updates should be committed to the database at the end of every few transactions. Doing so minimizes the portion of data that is unavailable to other processes and the elapsed time the data is unavailable.

Another option to minimize physical locking is to have logical row-level locking implemented with either an optimistic locking pattern or a pessimistic locking pattern.

Optimistic locking assumes a low likelihood of record contention. It typically means inserting a timestamp column in each database table that is used concurrently by both batch and online processing. When an application fetches a row for processing, it also fetches the timestamp. As the application then tries to update the processed row, the update uses the original timestamp in the WHERE clause. If the timestamp matches, the data and the timestamp are updated. If the timestamp does not match, this indicates that another application has updated the same row between the fetch and the update attempt. Therefore, the update cannot be performed.

Pessimistic locking is any locking strategy that assumes there is a high likelihood of record contention and, therefore, either a physical or a logical lock needs to be obtained at retrieval time. One type of pessimistic logical locking uses a dedicated lock-column in the database table. When an application retrieves the row for update, it sets a flag in the lock column. With the flag in place, other applications attempting to retrieve the same row logically fail. When the application that sets the flag updates the row, it also clears the flag, enabling the row to be retrieved by other applications. Note that the integrity of data must be maintained also between the initial fetch and the setting of the flag — for example, by using database locks (such as SELECT FOR UPDATE ). Note also that this method suffers from the same downside as physical locking except that it is somewhat easier to manage building a time-out mechanism that gets the lock released if the user goes to lunch while the record is locked.

These patterns are not necessarily suitable for batch processing, but they might be used for concurrent batch and online processing (such as in cases where the database does not support row-level locking). As a general rule, optimistic locking is more suitable for online applications, while pessimistic locking is more suitable for batch applications. Whenever logical locking is used, the same scheme must be used for all applications that access the data entities protected by logical locks.

Note that both of these solutions only address locking a single record. Often, we may need to lock a logically related group of records. With physical locks, you have to manage these very carefully to avoid potential deadlocks. With logical locks, it is usually best to build a logical lock manager that understands the logical record groups you want to protect and that can ensure that locks are coherent and non-deadlocking. This logical lock manager usually uses its own tables for lock management, contention reporting, time-out mechanism, and other concerns.

3. Parallel Processing Parallel processing lets multiple batch runs or jobs run in parallel to minimize the total elapsed batch processing time. This is not a problem as long as the jobs are not sharing the same files, database tables, or index spaces. If they do, this service should be implemented by using partitioned data. Another option is to build an architecture module for maintaining interdependencies by using a control table. A control table should contain a row for each shared resource and whether it is in use by an application or not. The batch architecture or the application in a parallel job would then retrieve information from that table to determine whether it can get access to the resource it needs.

If the data access is not a problem, parallel processing can be implemented through the use of additional threads to process in parallel. In a mainframe environment, parallel job classes have traditionally been used, to ensure adequate CPU time for all the processes. Regardless, the solution has to be robust enough to ensure time slices for all the running processes.

Other key issues in parallel processing include load balancing and the availability of general system resources, such as files, database buffer pools, and so on. Also, note that the control table itself can easily become a critical resource.

4. Partitioning Using partitioning lets multiple versions of large batch applications run concurrently. The purpose of this is to reduce the elapsed time required to process long batch jobs. Processes that can be successfully partitioned are those where the input file can be split or the main database tables partitioned to let the application run against different sets of data.

In addition, processes that are partitioned must be designed to process only their assigned data set. A partitioning architecture has to be closely tied to the database design and the database partitioning strategy. Note that database partitioning does not necessarily mean physical partitioning of the database (although, in most cases, this is advisable). The following image illustrates the partitioning approach:

The architecture should be flexible enough to allow dynamic configuration of the number of partitions. You shoul consider both automatic and user controlled configuration. Automatic configuration may be based on such parameters as the input file size and the number of input records.

4.1 Partitioning Approaches Selecting a partitioning approach has to be done on a case-by-case basis. The following list describes some of the possible partitioning approaches:

1. Fixed and Even Break-Up of Record Set

This involves breaking the input record set into an even number of portions (for example, 10, where each portion has exactly 1/10th of the entire record set). Each portion is then processed by one instance of the batch/extract application.

To use this approach, preprocessing is required to split the record set up. The result of this split is a lower and upper bound placement number that you can use as input to the batch/extract application to restrict its processing to only its portion.

Preprocessing could be a large overhead, as it has to calculate and determine the bounds of each portion of the record set.

2. Break up by a Key Column

This involves breaking up the input record set by a key column, such as a location code, and assigning data from each key to a batch instance. To achieve this, column values can be either:

Under option 1, adding new values means a manual reconfiguration of the batch or extract to ensure that the new value is added to a particular instance.

Under option 2, this ensures that all values are covered by an instance of the batch job. However, the number of values processed by one instance is dependent on the distribution of column values (there may be a large number of locations in the 0000-0999 range and few in the 1000-1999 range). Under this option, the data range should be designed with partitioning in mind.

Under both options, the optimal even distribution of records to batch instances cannot be realized. There is no dynamic configuration of the number of batch instances used.

3. Breakup by Views

This approach is basically breakup by a key column but on the database level. It involves breaking up the record set into views. These views are used by each instance of the batch application during its processing. The breakup is done by grouping the data.

With this option, each instance of a batch application has to be configured to hit a particular view (instead of the main table). Also, with the addition of new data values, this new group of data has to be included into a view. There is no dynamic configuration capability, as a change in the number of instances results in a change to the views.

4. Addition of a Processing Indicator

This involves the addition of a new column to the input table, which acts as an indicator. As a preprocessing step, all indicators are marked as being non-processed. During the record fetch stage of the batch application, records are read on the condition that an individual record is marked as being non-processed, and, once it is read (with lock), it is marked as being in processing. When that record is completed, the indicator is updated to either complete or error. You can start many instances of a batch application without a change, as the additional column ensures that a record is only processed once.

With this option, I/O on the table increases dynamically. In the case of an updating batch application, this impact is reduced, as a write must occur anyway.

5. Extract Table to a Flat File

This approach involves the extraction of the table into a flat file. This file can then be split into multiple segments and used as input to the batch instances.

With this option, the additional overhead of extracting the table into a file and splitting it may cancel out the effect of multi-partitioning. Dynamic configuration can be achieved by changing the file splitting script.

6. Use of a Hashing Column

This scheme involves the addition of a hash column (key or index) to the database tables used to retrieve the driver record. This hash column has an indicator to determine which instance of the batch application processes this particular row. For example, if there are three batch instances to be started, an indicator of 'A' marks a row for processing by instance 1, an indicator of 'B' marks a row for processing by instance 2, and an indicator of 'C' marks a row for processing by instance 3.

The procedure used to retrieve the records would then have an additional WHERE clause to select all rows marked by a particular indicator. The inserts in this table would involve the addition of the marker field, which would be defaulted to one of the instances (such as 'A').

A simple batch application would be used to update the indicators, such as to redistribute the load between the different instances. When a sufficiently large number of new rows have been added, this batch can be run (anytime, except in the batch window) to redistribute the new rows to other instances.

Additional instances of the batch application require only the running of the batch application (as described in the preceding paragraphs) to redistribute the indicators to work with a new number of instances.

4.2 Database and Application Design Principles

An architecture that supports multi-partitioned applications that run against partitioned database tables and use the key column approach should include a central partition repository for storing partition parameters. This provides flexibility and ensures maintainability. The repository generally consists of a single table, known as the partition table.

Information stored in the partition table is static and, in general, should be maintained by the DBA. The table should consist of one row of information for each partition of a multi-partitioned application. The table should have columns for Program ID Code, Partition Number (the logical ID of the partition), Low Value of the database key column for this partition, and High Value of the database key column for this partition.

On program start-up, the program id and partition number should be passed to the application from the architecture (specifically, from the control processing tasklet). If a key column approach is used, these variables are used to read the partition table to determine what range of data the application is to process. In addition, the partition number must be used throughout the processing to:

When applications run in parallel or are partitioned, contention for database resources and deadlocks may occur. It is critical that the database design team eliminate potential contention situations as much as possible, as part of the database design.

Also, the developers must ensure that the database index tables are designed with deadlock prevention and performance in mind.

Deadlocks or hot spots often occur in administration or architecture tables, such as log tables, control tables, and lock tables. The implications of these should be taken into account as well. Realistic stress tests are crucial for identifying the possible bottlenecks in the architecture.

To minimize the impact of conflicts on data, the architecture should provide services (such as wait-and-retry intervals) when attaching to a database or when encountering a deadlock. This means a built-in mechanism to react to certain database return codes and, instead of issuing an immediate error, waiting a predetermined amount of time and retrying the database operation.

4.4 Parameter Passing and Validation

The partition architecture should be relatively transparent to application developers. The architecture should perform all tasks associated with running the application in a partitioned mode, including:

If the database is partitioned, some additional validation may be necessary to ensure that a single partition does not span database partitions.

Also, the architecture should take into consideration the consolidation of partitions. Key questions include:

2.1. Java 17 Requirement

Spring Batch follows Spring Framework’s baselines for both Java version and third party dependencies. With Spring Batch 5, the Spring Framework version is being upgraded to Spring Framework 6, which requires Java 17. As a result, the Java version requirement for Spring Batch is also increasing to Java 17.

2.2. Major dependencies upgrade

To continue the integration with supported versions of the third party libraries that Spring Batch uses, Spring Batch 5 is updating the dependencies across the board to the following versions:

2.3.1. Data Source and Transaction manager Requirement Updates

Historically, Spring Batch provided a map-based job repository and job explorer implementations to work with an in-memory job repository. These implementations were deprecated in version 4 and completely removed in version 5. The recommended replacement is to use the JDBC-based implementations with an embedded database, such as H2, HSQL, and others.

In this release, the @EnableBatchProcessing annotation configures a JDBC-based JobRepository , which requires a DataSource and PlatformTransactionManager beans to be defined in the application context. The DataSource bean could refer to an embedded database to work with an in-memory job repository.

2.3.2. Transaction Manager Bean Exposure

Until version 4.3, the @EnableBatchProcessing annotation exposed a transaction manager bean in the application context. While this was convenient in many cases, the unconditional exposure of a transaction manager could interfere with a user-defined transaction manager. In this release, @EnableBatchProcessing no longer exposes a transaction manager bean in the application context.

2.3.3. New annotation attributes in EnableBatchProcessing

In this release, the @EnableBatchProcessing annotation provides new attributes to specify which components and parameters should be used to configure the Batch infrastructure beans. For example, it is now possible to specify which data source and transaction manager Spring Batch should configure in the job repository as follows:

@Configuration
@EnableBatchProcessing(dataSourceRef = "batchDataSource", transactionManagerRef = "batchTransactionManager")
public class MyJobConfiguration {
	@Bean
	public Job job(JobRepository jobRepository) {
		return new JobBuilder("myJob", jobRepository)
				//define job flow as needed
				.build();
In this example, batchDataSource and batchTransactionManager refer to beans in the application context,
and which will be used to configure the job repository and job explorer. There is no need to define a
custom BatchConfigurer anymore, which was removed in this release.
2.3.4. New configuration class for infrastructure beans

In this release, a new configuration class named DefaultBatchConfiguration can be used as an alternative to
using @EnableBatchProcessing for the configuration of infrastructure beans. This class provides infrastructure
beans with default configuration which can be customized as needed. The following snippet shows a typical usage
of this class:
@Configuration
class MyJobConfiguration extends DefaultBatchConfiguration {
	@Bean
	public Job job(JobRepository jobRepository) {
		return new JobBuilder("myJob", jobRepository)
				//define job flow as needed
				.build();
In this example, the JobRepository bean injected in the Job bean definition is defined in the DefaultBatchConfiguration
class. Custom parameters can be specified by overriding the corresponding getter. For example, the following example shows
how to override the default character encoding used in the job repository and job explorer:
@Configuration
class MyJobConfiguration extends DefaultBatchConfiguration {
	@Bean
	public Job job(JobRepository jobRepository) {
		return new JobBuilder("job", jobRepository)
				// define job flow as needed
				.build();
	@Override
	protected Charset getCharset() {
		return StandardCharsets.ISO_8859_1;
2.4.1. Support for any type as a job parameter

This version adds support to use any type as a job parameter, and not only the 4 pre-defined
types (long, double, string, date) as in v4. This change has an impact on how job parameters
are persisted in the database (There are no more 4 distinct columns for each predefined type).
Please check Column change in BATCH_JOB_EXECUTION_PARAMS
for DDL changes. The fully qualified name of the type of the parameter is now persisted as a String,
as well as the parameter value. String literals are converted to the parameter type with the standard
Spring conversion service. The standard conversion service can be enriched with any required converter
to convert user specific types to and from String literals.
2.4.2. Default job parameter conversion

The default notation of job parameters in v4 was specified as follows:
[+|-]parameterName(parameterType)=value
where parameterType is one of [string,long,double,date]. This notation is limited, constraining,
does not play well with environment variables and is not friendly with Spring Boot.
In v5, there are two way to specify job parameters:
Default notation

The default notation is now specified as follows:
parameterName=parameterValue,parameterType,identificationFlag
where parameterType is the fully qualified name of the type of the parameter. Spring Batch provides
the DefaultJobParametersConverter to support this notation.
Extended notation

While the default notation is well suited for the majority of use cases, it might not be convenient when
the value contains a comma for example. In this case, the extended notation can be used, which is inspired
by Spring Boot’s Json Application Properties
and is specified as follows:
parameterName='{"value": "parameterValue", "type":"parameterType", "identifying": "booleanValue"}'
where parameterType is the fully qualified name of the type of the parameter. Spring Batch provides the
JsonJobParametersConverter to support this notation.
2.5. Execution context serialization updates

Starting from v5, the DefaultExecutionContextSerializer was updated to serialize/deserialize the context to/from Base64.




    

Moreover, the default ExecutionContextSerializer configured by @EnableBatchProcessing or DefaultBatchConfiguration
was changed from JacksonExecutionContextStringSerializer to DefaultExecutionContextSerializer. The dependency to
Jackson was made optional. In order to use the JacksonExecutionContextStringSerializer, jackson-core should be added
to the classpath.
2.6. SystemCommandTasklet updates

The SystemCommandTasklet has been revisited in this release and was changed as follows:
A new strategy interface named CommandRunner was introduced in order to decouple the command execution
from the tasklet execution. The default implementation is the JvmCommandRunner which uses the java.lang.Runtime#exec
API to run system commands. This interface can be implemented to use any other API to run system commands.
The method that runs the command now accepts an array of `String`s representing the command and its arguments.
There is no need anymore to tokenize the command or do any pre-processing. This change makes the API more intuitive,
and less prone to errors.
2.7.1. Removal of autowiring from test utilities

Up to version 4.3, the JobLauncherTestUtils and JobRepositoryTestUtils used
to autowire the job under test as well as the test datasource to facilitate the
testing infrastructure setup. While this was convenient for most use cases, it
turned out to cause several issues for test contexts where multiple jobs or
multiple data sources are defined.
In this release, we introduced a few changes to remove the autowiring of such
dependencies in order to avoid any issues while importing those utilities either
manually or through the @SpringBatchTest annotation.
2.7.2. Migration to JUnit Jupiter

In this release, the entire test suite of Spring Batch has been migrated to JUnit 5.
While this does not impact end users directly, it helps the Batch team as well as
community contributors to use the next generation of JUnit to write better tests.
2.9. Transaction support in JobExplorer and JobOperator

This release introduces transaction support in the JobExplorer created through
the JobExplorerFactoryBean. It is now possible to specify which transaction manager
to use to drive the ready-only transactions when querying the Batch meta-data as well as
customizing the transaction attributes.
The same transaction support was added to the JobOperator through a new factory bean
named JobOperatorFactoryBean.
2.9.1. Automatic registration of a JobOperator with EnableBatchProcessing

As of version 4, the EnableBatchProcessing annotation provided all the basic infrastructure
beans that are required to launch Spring Batch jobs. However, it did not register a job
operator bean, which is the main entry point to stop, restart and abandon job executions.
While these utilities are not used as often as launching jobs, adding a job operator automatically
in the application context can be useful to avoid a manual configuration of such a bean
by end users.
2.9.2. Improved Java records support

The support for Java records as items in a chunk-oriented step has initially been introduced in v4.3,
but that support was limited due to the fact that v4 has Java 8 as a baseline. The initial support was
based on reflection tricks to create Java records and populate them with data, without having access to the
java.lang.Record API that was finalised in Java 16.
Now that v5 has Java 17 as a baseline, we have improved records support in Spring Batch by leveraging the
Record API in different parts of the framework. For example, the FlatFileItemReaderBuilder is now able
to detect if the item type is a record or a regular class and configure the corresponding FieldSetMapper
implementation accordingly (ie RecordFieldSetMapper for records and BeanWrapperFieldSetMapper for regular
classes). The goal here is to make the configuration of the required FieldSetMapper type transparent to the user.
2.9.3. Batch tracing with Micrometer

With the upgrade to Micrometer 1.10, you can now get Batch tracing in addition to Batch metrics.
Spring Batch will create a span for each job and a span for each step within a job. This tracing
meta-data can be collected and viewed on a dashboard like Zipkin for example.
Moreover, this release introduces new metrics like the currently active step, as well as the job launch count
through the provided JobLauncher.
2.9.4. Java 8 features updates

We took the opportunity of this major release to improve the code base with features from Java 8+, for example:
2.9.6. Full support for MariaDB as a separate product

Up until v4.3, Spring Batch provided support for MariaDB by considering it as MySQL. In this release, MariaDB
is treated as an independent product with its own DDL script and DataFieldMaxValueIncrementer.
2.9.7. New Maven Bill Of Materials for Spring Batch modules

This feature has been requested several times and is finally shipped in v5. It is now possible to use the newly
added Maven BOM to import Spring Batch modules with a consistent version number.
2.9.8. UTF-8 by default

Several issues related to characters encoding have been reported over the years in different
areas of the framework, like inconsistent default encoding between file-based item readers
and writers, serialization/deserialization issues when dealing with multi-byte characters
in the execution context, etc.
In the same spirit as JEP 400 and following the
UTF-8 manifesto, this release updates the default encoding
to UTF-8 in all areas of the framework and ensures this default is configurable as needed.
2.9.9. Full GraalVM native support

The effort towards providing support to compile Spring Batch applications as native executables
using the GraalVM native-image compiler has started in v4.2 and was shipped as experimental in v4.3.
In this release, the native support has been improved significantly by providing the necessary runtime
hints to natively compile Spring Batch applications with GraalVM and is now considered out of beta.
2.9.10. Execution context Meta-data improvement

In addition to what Spring Batch already persists in the execution context with regard to runtime
information (like the step type, restart flag, etc), this release adds an important detail in the
execution context which is the Spring Batch version that was used to serialize the context.
While this seems a detail, it has a huge added value when debugging upgrade issue with regard to
execution context serialization and deserialization.
2.9.11. Improved documentation

In this release, the documentation was updated to use the Spring Asciidoctor Backend.
This backend ensures that all projects from the portfolio follow the same documentation style.
For consistency with other projects, the reference documentation of Spring Batch was updated
to use this backend in this release.
2.10.1. API deprecation and removal

In this major release, all APIs that were deprecated in previous versions have been removed.
Moreover, some APIs have been deprecated in v5.0 and are scheduled for removal in v5.2.
Finally, some APIs have been moved or removed without deprecation for practical reasons.
Please refer to the migration guide
for more details about these changes.
2.10.2. SQLFire Support Removal

SqlFire has been announced to be EOL as of November 1st, 2014. The support of SQLFire as a job repository
was deprecated in version v4.3 and removed in version v5.0.
2.10.3. GemFire support removal

Based on the [decision to discontinue](https://github.com/spring-projects/spring-data-geode#notice
) the support of Spring Data for Apache Geode, the support for Geode in Spring Batch was removed.
The code was moved to the [spring-batch-extensions](https://github.com/spring-projects/spring-batch-extensions) repository
as a community-driven effort.
2.10.4. JSR-352 Implementation Removal

Due to a lack of adoption, the implementation of JSR-352 has been discontinued in this release.
To any experienced batch architect, the overall concepts of batch processing used in
Spring Batch should be familiar and comfortable. There are “Jobs” and “Steps” and
developer-supplied processing units called ItemReader and ItemWriter. However,
because of the Spring patterns, operations, templates, callbacks, and idioms, there are
opportunities for the following:
Simple and default implementations that allow for quick adoption and ease of use
out of the box.
Significantly enhanced extensibility.
The following diagram is a simplified version of the batch reference architecture that
has been used for decades. It provides an overview of the components that make up the
domain language of batch processing. This architecture framework is a blueprint that has
been proven through decades of implementations on the last several generations of
platforms (COBOL on mainframes, C on Unix, and now Java anywhere). JCL and COBOL developers
are likely to be as comfortable with the concepts as C, C#, and Java developers. Spring
Batch provides a physical implementation of the layers, components, and technical
services commonly found in the robust, maintainable systems that are used to address the
creation of simple to complex batch applications, with the infrastructure and extensions
to address very complex processing needs.
The preceding diagram highlights the key concepts that make up the domain language of
Spring Batch. A Job has one to many steps, each of which has exactly one ItemReader,
one ItemProcessor, and one ItemWriter. A job needs to be launched (with
JobLauncher), and metadata about the currently running process needs to be stored (in
JobRepository).
3.1. Job

This section describes stereotypes relating to the concept of a batch job. A Job is an
entity that encapsulates an entire batch process. As is common with other Spring
projects, a Job is wired together with either an XML configuration file or Java-based
configuration. This configuration may be referred to as the “job configuration”. However,
Job is only the top of an overall hierarchy, as shown in the following diagram:
In Spring Batch, a Job is simply a container for Step instances. It combines multiple
steps that logically belong together in a flow and allows for configuration of properties
global to all steps, such as restartability. The job configuration contains:
For those who use Java configuration, Spring Batch provides a default implementation of
the Job interface in the form of the SimpleJob class, which creates some standard
functionality on top of Job. When using Java-based configuration, a collection of
builders is made available for the instantiation of a Job, as the following
example shows:
@Bean
public Job footballJob(JobRepository jobRepository) {
    return new JobBuilder("footballJob", jobRepository)
                     .start(playerLoad())
                     .next(gameLoad())
                     .next(playerSummarization())
                     .build();
For those who use XML configuration, Spring Batch provides a default implementation of the
Job interface in the form of the SimpleJob class, which creates some standard
functionality on top of Job. However, the batch namespace abstracts away the need to
instantiate it directly. Instead, you can use the <job> element, as the
following example shows:
<job id="footballJob">
    <step id="playerload" next="gameLoad"/>
    <step id="gameLoad" next="playerSummarization"/>
    <step id="playerSummarization"/>
3.1.1. JobInstance

A JobInstance refers to the concept of a logical job run. Consider a batch job that
should be run once at the end of the day, such as the EndOfDay Job from the preceding
diagram. There is one EndOfDay job, but each individual run of the Job must be
tracked separately. In the case of this job, there is one logical JobInstance per day.
For example, there is a January 1st run, a January 2nd run, and so on. If the January 1st
run fails the first time and is run again the next day, it is still the January 1st run.
(Usually, this corresponds with the data it is processing as well, meaning the January
1st run processes data for January 1st). Therefore, each JobInstance can have multiple
executions (JobExecution is discussed in more detail later in this chapter), and only
one JobInstance (which corresponds to a particular Job and identifying JobParameters) can
run at a given time.




    

The definition of a JobInstance has absolutely no bearing on the data to be loaded.
It is entirely up to the ItemReader implementation to determine how data is loaded. For
example, in the EndOfDay scenario, there may be a column on the data that indicates the
effective date or schedule date to which the data belongs. So, the January 1st run
would load only data from the 1st, and the January 2nd run would use only data from the
2nd. Because this determination is likely to be a business decision, it is left up to the
ItemReader to decide. However, using the same JobInstance determines whether or not
the “state” (that is, the ExecutionContext, which is discussed later in this chapter)
from previous executions is used. Using a new JobInstance means “start from the
beginning,” and using an existing instance generally means “start from where you left
off”.
3.1.2. JobParameters

Having discussed JobInstance and how it differs from Job, the natural question to ask
is: “How is one JobInstance distinguished from another?” The answer is:
JobParameters. A JobParameters object holds a set of parameters used to start a batch
job. They can be used for identification or even as reference data during the run, as the
following image shows:
In the preceding example, where there are two instances, one for January 1st and another
for January 2nd, there is really only one Job, but it has two JobParameter objects:
one that was started with a job parameter of 01-01-2017 and another that was started with
a parameter of 01-02-2017. Thus, the contract can be defined as: JobInstance = Job
 + identifying JobParameters. This allows a developer to effectively control how a
JobInstance is defined, since they control what parameters are passed in.
Not all job parameters are required to contribute to the identification of a
JobInstance.  By default, they do so. However, the framework also allows the submission
of a Job with parameters that do not contribute to the identity of a JobInstance.
3.1.3. JobExecution

A JobExecution refers to the technical concept of a single attempt to run a Job. An
execution may end in failure or success, but the JobInstance corresponding to a given
execution is not considered to be complete unless the execution completes successfully.
Using the EndOfDay Job described previously as an example, consider a JobInstance for
01-01-2017 that failed the first time it was run. If it is run again with the same
identifying job parameters as the first run (01-01-2017), a new JobExecution is
created. However, there is still only one JobInstance.
A Job defines what a job is and how it is to be executed, and a JobInstance is a
purely organizational object to group executions together, primarily to enable correct
restart semantics. A JobExecution, however, is the primary storage mechanism for what
actually happened during a run and contains many more properties that must be controlled
and persisted, as the following table shows:
Table 1. JobExecution Properties
Status
A BatchStatus object that indicates the status of the execution. While running, it is
BatchStatus#STARTED. If it fails, it is BatchStatus#FAILED. If it finishes
successfully, it is BatchStatus#COMPLETED
startTime
A java.time.LocalDateTime representing the current system time when the execution was started.
This field is empty if the job has yet to start.
endTime
A java.time.LocalDateTime representing the current system time when the execution finished,
regardless of whether or not it was successful. The field is empty if the job has yet to
finish.
exitStatus
The ExitStatus, indicating the result of the run. It is most important, because it
contains an exit code that is returned to the caller. See chapter 5 for more details. The
field is empty if the job has yet to finish.
createTime
A java.time.LocalDateTime representing the current system time when the JobExecution was
first persisted. The job may not have been started yet (and thus has no start time), but
it always has a createTime, which is required by the framework for managing job-level
ExecutionContexts.
lastUpdated
A java.time.LocalDateTime representing the last time a JobExecution was persisted. This field
is empty if the job has yet to start.
executionContext
The “property bag” containing any user data that needs to be persisted between
executions.
failureExceptions
The list of exceptions encountered during the execution of a Job. These can be useful
if more than one exception is encountered during the failure of a Job.
These properties are important because they are persisted and can be used to completely
determine the status of an execution. For example, if the EndOfDay job for 01-01 is
executed at 9:00 PM and fails at 9:30, the following entries are made in the batch
metadata tables:
Table 2. BATCH_JOB_INSTANCE
Now that the job has failed, assume that it took the entire night for the problem to be
determined, so that the “batch window” is now closed. Further assuming that the window
starts at 9:00 PM, the job is kicked off again for 01-01, starting where it left off and
completing successfully at 9:30. Because it is now the next day, the 01-02 job must be
run as well, and it is kicked off just afterwards at 9:31 and completes in its normal one
hour time at 10:30. There is no requirement that one JobInstance be kicked off after
another, unless there is potential for the two jobs to attempt to access the same data,
causing issues with locking at the database level. It is entirely up to the scheduler to
determine when a Job should be run. Since they are separate JobInstances, Spring
Batch makes no attempt to stop them from being run concurrently. (Attempting to run the
same JobInstance while another is already running results in a
JobExecutionAlreadyRunningException being thrown). There should now be an extra entry
in both the JobInstance and JobParameters tables and two extra entries in the
JobExecution table, as shown in the following tables:
Table 5. BATCH_JOB_INSTANCE
3.2. Step

A Step is a domain object that encapsulates an independent, sequential phase of a batch
job. Therefore, every Job is composed entirely of one or more steps. A Step contains
all of the information necessary to define and control the actual batch processing. This
is a necessarily vague description because the contents of any given Step are at the
discretion of the developer writing a Job. A Step can be as simple or complex as the
developer desires. A simple Step might load data from a file into the database,
requiring little or no code (depending upon the implementations used). A more complex
Step may have complicated business rules that are applied as part of the processing. As
with a Job, a Step has an individual StepExecution that correlates with a unique
JobExecution, as the following image shows:
3.2.1. StepExecution

A StepExecution represents a single attempt to execute a Step. A new StepExecution
is created each time a Step is run, similar to JobExecution. However, if a step fails
to execute because the step before it fails, no execution is persisted for it. A
StepExecution is created only when its Step is actually started.
Step executions are represented by objects of the StepExecution class. Each execution
contains a reference to its corresponding step and JobExecution and transaction-related
data, such as commit and rollback counts and start and end times. Additionally, each step
execution contains an ExecutionContext, which contains any data a developer needs to
have persisted across batch runs, such as statistics or state information needed to
restart. The following table lists the properties for StepExecution:
Table 8. StepExecution Properties
Status
A BatchStatus object that indicates the status of the execution. While running, the
status is BatchStatus.STARTED. If it fails, the status is BatchStatus.FAILED. If it
finishes successfully, the status is BatchStatus.COMPLETED.
startTime
A java.time.LocalDateTime representing the current system time when the execution was started.
This field is empty if the step has yet to start.
endTime
A java.time.LocalDateTime representing the current system time when the execution finished,
regardless of whether or not it was successful. This field is empty if the step has yet to
exit.
exitStatus
The ExitStatus indicating the result of the execution. It is most important, because
it contains an exit code that is returned to the caller. See chapter 5 for more details.
This field is empty if the job has yet to exit.
executionContext
The “property bag” containing any user data that needs to be persisted between
executions.
readCount
The number of items that have been successfully read.
writeCount
The number of items that have been successfully written.
commitCount
The number of transactions that have been committed for this execution.
rollbackCount
The number of times the business transaction controlled by the Step has been rolled
back.
readSkipCount
The number of times read has failed, resulting in a skipped item.
processSkipCount
The number of times process has failed, resulting in a skipped item.
filterCount
The number of items that have been “filtered” by the ItemProcessor.
writeSkipCount
The number of times write has failed, resulting in a skipped item.
3.3. ExecutionContext

An ExecutionContext represents a collection of key/value pairs that are persisted and
controlled by the framework to give developers a place to store persistent
state that is scoped to a StepExecution object or a JobExecution object. (For those
familiar with Quartz, it is very similar to JobDataMap.) The best usage example is to
facilitate restart. Using flat file input as an example, while processing individual
lines, the framework periodically persists the ExecutionContext at commit points. Doing
so lets the ItemReader store its state in case a fatal error occurs during the run
or even if the power goes out. All that is needed is to put the current number of lines
read into the context, as the following example shows, and the framework does the
rest:
executionContext.putLong(getKey(LINES_READ_COUNT), reader.getPosition());
Using the EndOfDay example from the Job stereotypes section as an example, assume there
is one step, loadData, that loads a file into the database. After the first failed run,
the metadata tables would look like the following example:
Table 9. BATCH_JOB_INSTANCE
In the preceding case, the Step ran for 30 minutes and processed 40,321 “pieces”, which
would represent lines in a file in this scenario. This value is updated just before each
commit by the framework and can contain multiple rows corresponding to entries within the
ExecutionContext. Being notified before a commit requires one of the various
StepListener implementations (or an ItemStream), which are discussed in more detail
later in this guide. As with the previous example, it is assumed that the Job is
restarted the next day. When it is restarted, the values from the ExecutionContext of
the last run are reconstituted from the database. When the ItemReader is opened, it can
check to see if it has any stored state in the context and initialize itself from there,
as the following example shows:
if (executionContext.containsKey(getKey(LINES_READ_COUNT))) {
    log.debug("Initializing for restart. Restart data is: " + executionContext);
    long lineCount = executionContext.getLong(getKey(LINES_READ_COUNT));
    LineReader reader = getReader();
    Object record = "";
    while (reader.getPosition() < lineCount && record != null) {
        record = readLine();
In this case, after the preceding code runs, the current line is 40,322, letting the Step
start again from where it left off. You can also use the ExecutionContext for
statistics that need to be persisted about the run itself. For example, if a flat file
contains orders for processing that exist across multiple lines, it may be necessary to
store how many orders have been processed (which is much different from the number of
lines read), so that an email can be sent at the end of the Step with the total number
of orders processed in the body. The framework handles storing this for the developer,
to correctly scope it with an individual JobInstance. It can be very difficult to
know whether an existing ExecutionContext should be used or not. For example, using the
EndOfDay example from above, when the 01-01 run starts again for the second time, the
framework recognizes that it is the same JobInstance and on an individual Step basis,
pulls the ExecutionContext out of the database, and hands it (as part of the
StepExecution) to the Step itself. Conversely, for the 01-02 run, the framework
recognizes that it is a different instance, so an empty context must be handed to the
Step. There are many of these types of determinations that the framework makes for the
developer, to ensure the state is given to them at the correct time. It is also important
to note that exactly one ExecutionContext exists per StepExecution at any given time.
Clients of the ExecutionContext should be careful, because this creates a shared
keyspace. As a result, care should be taken when putting values in to ensure no data is
overwritten. However, the Step stores absolutely no data in the context, so there is no
way to adversely affect the framework.
Note that there is at least one ExecutionContext per
JobExecution and one for every StepExecution. For example, consider the following
code snippet:
ExecutionContext ecStep = stepExecution.getExecutionContext();
ExecutionContext ecJob = jobExecution.getExecutionContext();
//ecStep does not equal ecJob
As noted in the comment, ecStep does not equal ecJob. They are two different
ExecutionContexts. The one scoped to the Step is saved at every commit point in the
Step, whereas the one scoped to the Job is saved in between every Step execution.
3.4. JobRepository

JobRepository is the persistence mechanism for all of the stereotypes mentioned earlier.
It provides CRUD operations for JobLauncher, Job, and Step implementations. When a
Job is first launched, a JobExecution is obtained from the repository. Also, during
the course of execution, StepExecution and JobExecution implementations are persisted
by passing them to the repository.




    

The Spring Batch XML namespace provides support for configuring a JobRepository instance
with the <job-repository> tag, as the following example shows:
<job-repository id="jobRepository"/>
When using Java configuration, the @EnableBatchProcessing annotation provides a
JobRepository as one of the components that is automatically configured.
3.5. JobLauncher

JobLauncher represents a simple interface for launching a Job with a given set of
JobParameters, as the following example shows:
public interface JobLauncher {
public JobExecution run(Job job, JobParameters jobParameters)
            throws JobExecutionAlreadyRunningException, JobRestartException,
                   JobInstanceAlreadyCompleteException, JobParametersInvalidException;
3.6. ItemReader

ItemReader is an abstraction that represents the retrieval of input for a Step, one
item at a time. When the ItemReader has exhausted the items it can provide, it
indicates this by returning null. You can find more details about the ItemReader interface and its
various implementations in
Readers And Writers.
3.7. ItemWriter

ItemWriter is an abstraction that represents the output of a Step, one batch or chunk
of items at a time. Generally, an ItemWriter has no knowledge of the input it should
receive next and knows only the item that was passed in its current invocation. You can find more
details about the ItemWriter interface and its various implementations in
Readers And Writers.
3.8. ItemProcessor

ItemProcessor is an abstraction that represents the business processing of an item.
While the ItemReader reads one item, and the ItemWriter writes one item, the
ItemProcessor provides an access point to transform or apply other business processing.
If, while processing the item, it is determined that the item is not valid, returning
null indicates that the item should not be written out. You can find more details about the
ItemProcessor interface in
Readers And Writers.
3.9. Batch Namespace

Many of the domain concepts listed previously need to be configured in a Spring
ApplicationContext. While there are implementations of the interfaces above that you can
use in a standard bean definition, a namespace has been provided for ease of
configuration, as the following example shows:
<beans:beans xmlns="http://www.springframework.org/schema/batch"
xmlns:beans="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
   http://www.springframework.org/schema/beans
   https://www.springframework.org/schema/beans/spring-beans.xsd
   http://www.springframework.org/schema/batch
   https://www.springframework.org/schema/batch/spring-batch.xsd">
<job id="ioSampleJob">
    <step id="step1">
        <tasklet>
            <chunk reader="itemReader" writer="itemWriter" commit-interval="2"/>
        </tasklet>
    </step>
</beans:beans>
As long as the batch namespace has been declared, any of its elements can be used. You can find more
information on configuring a Job in Configuring and
Running a Job. You can find more information on configuring a Step in
Configuring a Step.
In the domain section , the overall
architecture design was discussed, using the following diagram as a
guide:
While the Job object may seem like a simple
container for steps, you must be aware of many configuration options.
Furthermore, you must consider many options about
how a Job can be run and how its metadata can be
stored during that run. This chapter explains the various configuration
options and runtime concerns of a Job.
4.1. Configuring a Job

There are multiple implementations of the Job interface. However,
builders abstract away the difference in configuration.
The following example creates a footballJob:
@Bean
public Job footballJob(JobRepository jobRepository) {
    return new JobBuilder("footballJob", jobRepository)
                     .start(playerLoad())
                     .next(gameLoad())
                     .next(playerSummarization())
                     .build();
A Job (and, typically, any Step within it) requires a JobRepository.  The
configuration of the JobRepository is handled through the Java Configuration.
The preceding example illustrates a Job that consists of three Step instances.  The job related
builders can also contain other elements that help with parallelization (Split),
declarative flow control (Decision), and externalization of flow definitions (Flow).
There are multiple implementations of the Job
interface. However, the namespace abstracts away the differences in configuration. It has
only three required dependencies: a name, JobRepository , and a list of Step instances.
The following example creates a footballJob:
<job id="footballJob">
    <step id="playerload"          parent="s1" next="gameLoad"/>
    <step id="gameLoad"            parent="s2" next="playerSummarization"/>
    <step id="playerSummarization" parent="s3"/>
The examples here use a parent bean definition to create the steps.
See the section on step configuration
for more options when declaring specific step details inline. The XML namespace
defaults to referencing a repository with an ID of jobRepository, which
is a sensible default. However, you can explicitly override it:
<job id="footballJob" job-repository="specialRepository">
    <step id="playerload"          parent="s1" next="gameLoad"/>
    <step id="gameLoad"            parent="s3" next="playerSummarization"/>
    <step id="playerSummarization" parent="s3"/>
In addition to steps, a job configuration can contain other elements that help with
parallelization (<split>), declarative flow control (<decision>) and externalization
of flow definitions (<flow/>).
4.1.1. Restartability

One key issue when executing a batch job concerns the behavior of a Job when it is
restarted. The launching of a Job is considered to be a “restart” if a JobExecution
already exists for the particular JobInstance. Ideally, all jobs should be able to start
up where they left off, but there are scenarios where this is not possible.
In this scenario, it is entirely up to the developer to ensure that a new JobInstance is created.
However, Spring Batch does provide some help. If a Job should never be
restarted but should always be run as part of a new JobInstance, you can set the
restartable property to false.
The following example shows how to set the restartable field to false in XML:
XML Configuration
<job id="footballJob" restartable="false">
The following example shows how to set the restartable field to false in Java:
Java Configuration
@Bean
public Job footballJob(JobRepository jobRepository) {
    return new JobBuilder("footballJob", jobRepository)
                     .preventRestart()
                     .build();
To phrase it another way, setting restartable to false means “this
Job does not support being started again”. Restarting a Job that is not
restartable causes a JobRestartException to
be thrown.
The following Junit code causes the exception to be thrown:
Job job = new SimpleJob();
job.setRestartable(false);
JobParameters jobParameters = new JobParameters();
JobExecution firstExecution = jobRepository.createJobExecution(job, jobParameters);
jobRepository.saveOrUpdate(firstExecution);
try {
    jobRepository.createJobExecution(job, jobParameters);
    fail();
catch (JobRestartException e) {
    // expected
The first attempt to create a
JobExecution for a non-restartable
job causes no issues. However, the second
attempt throws a JobRestartException.
4.1.2. Intercepting Job Execution

During the course of the execution of a
Job, it may be useful to be notified of various
events in its lifecycle so that custom code can be run.
SimpleJob allows for this by calling a
JobListener at the appropriate time:
public interface JobExecutionListener {
    void beforeJob(JobExecution jobExecution);
    void afterJob(JobExecution jobExecution);
<job id="footballJob">
    <step id="playerload"          parent="s1" next="gameLoad"/>
    <step id="gameLoad"            parent="s2" next="playerSummarization"/>
    <step id="playerSummarization" parent="s3"/>
    <listeners>
        <listener ref="sampleListener"/>
    </listeners>
The following example shows how to add a listener method to a Java job definition:
Java Configuration
@Bean
public Job footballJob(JobRepository jobRepository) {
    return new JobBuilder("footballJob", jobRepository)
                     .listener(sampleListener())
                     .build();
Note that the afterJob method is called regardless of the success or
failure of the Job. If you need to determine success or failure, you can get that information
from the JobExecution:
public void afterJob(JobExecution jobExecution){
    if (jobExecution.getStatus() == BatchStatus.COMPLETED ) {
        //job success
    else if (jobExecution.getStatus() == BatchStatus.FAILED) {
        //job failure
If a group of Jobs share similar but not
identical configurations, it may help to define a “parent”
Job from which the concrete
Job instances can inherit properties. Similar to class
inheritance in Java, a “child” Job combines
its elements and attributes with the parent’s.
In the following example, baseJob is an abstract
Job definition that defines only a list of
listeners. The Job (job1) is a concrete
definition that inherits the list of listeners from baseJob and merges
it with its own list of listeners to produce a
Job with two listeners and one
Step (step1).
<job id="baseJob" abstract="true">
    <listeners>
        <listener ref="listenerOne"/>
    <listeners>
<job id="job1" parent="baseJob">
    <step id="step1" parent="standaloneStep"/>
    <listeners merge="true">
        <listener ref="listenerTwo"/>
    <listeners>
See the section on Inheriting from a Parent Step
for more detailed information.
4.1.4. JobParametersValidator

A job declared in the XML namespace or using any subclass of
AbstractJob can optionally declare a validator for the job parameters at
runtime. This is useful when, for instance, you need to assert that a job
is started with all its mandatory parameters. There is a
DefaultJobParametersValidator that you can use to constrain combinations
of simple mandatory and optional parameters. For more complex
constraints, you can implement the interface yourself.
The configuration of a validator is supported through the XML namespace through a child
element of the job, as the following example shows:
<job id="job1" parent="baseJob3">
    <step id="step1" parent="standaloneStep"/>
    <validator ref="parametersValidator"/>
You can specify the validator as a reference (as shown earlier) or as a nested bean
definition in  the beans namespace.
The configuration of a validator is supported through the Java builders:
@Bean
public Job job1(JobRepository jobRepository) {
    return new JobBuilder("job1", jobRepository)
                     .validator(parametersValidator())
                     .build();
4.2. Java Configuration

Spring 3 brought the ability to configure applications with Java instead of XML. As of
Spring Batch 2.2.0, you can configure batch jobs by using the same Java configuration.
There are three components for the Java-based configuration: the @EnableBatchProcessing
annotation and two builders.
The @EnableBatchProcessing annotation works similarly to the other @Enable* annotations in the
Spring family. In this case, @EnableBatchProcessing provides a base configuration for
building batch jobs. Within this base configuration, an instance of StepScope and JobScope are
created, in addition to a number of beans being made available to be autowired:
The default implementation provides the beans mentioned in the preceding list and requires a DataSource
and a PlatformTransactionManager to be provided as beans within the context. The data source and transaction
manager are used by the JobRepository and JobExplorer instances. By default, the data source named dataSource
and the transaction manager named transactionManager will be used. You can customize any of these beans by using
the attributes of the @EnableBatchProcessing annotation. The following example shows how to provide a
custom data source and transaction manager:
@Configuration
@EnableBatchProcessing(dataSourceRef = "batchDataSource", transactionManagerRef = "batchTransactionManager")
public class MyJobConfiguration {
	@Bean
	public DataSource batchDataSource() {
		return new EmbeddedDatabaseBuilder().setType(EmbeddedDatabaseType.HSQL)
				.addScript("/org/springframework/batch/core/schema-hsqldb.sql")
				.generateUniqueName(true).build();
	@Bean
	public JdbcTransactionManager batchTransactionManager(DataSource dataSource) {
		return new JdbcTransactionManager(dataSource);
	public Job job(JobRepository jobRepository) {
		return new JobBuilder("myJob", jobRepository)
				//define job flow as needed
				.build();
Starting from v5.0, an alternative, programmatic way of configuring base infrastrucutre beans
is provided through the DefaultBatchConfiguration class. This class provides the same beans
provided by @EnableBatchProcessing and can be used as a base class to configure batch jobs.
The following snippet is a typical example of how to use it:




    

@Configuration
class MyJobConfiguration extends DefaultBatchConfiguration {
	@Bean
	public Job job(JobRepository jobRepository) {
		return new JobBuilder("job", jobRepository)
				// define job flow as needed
				.build();
The data source and transaction manager will be resolved from the application context
and set on the job repository and job explorer. You can customize the configuration
of any infrastructure bean by overriding the required setter. The following example
shows how to customize the character encoding for instance:
@Configuration
class MyJobConfiguration extends DefaultBatchConfiguration {
	@Bean
	public Job job(JobRepository jobRepository) {
		return new JobBuilder("job", jobRepository)
				// define job flow as needed
				.build();
	@Override
	protected Charset getCharset() {
		return StandardCharsets.ISO_8859_1;
@EnableBatchProcessing should not be used with DefaultBatchConfiguration. You should
either use the declarative way of configuring Spring Batch through @EnableBatchProcessing,
or use the programmatic way of extending DefaultBatchConfiguration, but not both ways at
the same time.
4.3. Configuring a JobRepository

When using @EnableBatchProcessing, a JobRepository is provided for you.
This section describes how to configure your own.
As described earlier, the JobRepository is used for basic CRUD operations of the various persisted
domain objects within Spring Batch, such as JobExecution and StepExecution.
It is required by many of the major framework features, such as the JobLauncher,
Job, and Step.
The batch namespace abstracts away many of the implementation details of the
JobRepository implementations and their collaborators. However, there are still a few
configuration options available, as the following example shows:
XML Configuration
<job-repository id="jobRepository"
    data-source="dataSource"
    transaction-manager="transactionManager"
    isolation-level-for-create="SERIALIZABLE"
    table-prefix="BATCH_"
	max-varchar-length="1000"/>
Other than the id, none of the configuration options listed earlier are required. If they are
not set, the defaults shown earlier are used.
The max-varchar-length defaults to 2500, which is the length of the long
VARCHAR columns in the sample schema
scripts.
Other than the dataSource and  the transactionManager, none of the configuration options listed earlier are required.
If they are not set, the defaults shown earlier
are used. The
max varchar length defaults to 2500, which is the
length of the long VARCHAR columns in the
sample schema scripts
4.3.1. Transaction Configuration for the JobRepository

If the namespace or the provided FactoryBean is used, transactional advice is
automatically created around the repository. This is to ensure that the batch metadata,
including state that is necessary for restarts after a failure, is persisted correctly.
The behavior of the framework is not well defined if the repository methods are not
transactional. The isolation level in the create* method attributes is specified
separately to ensure that, when jobs are launched, if two processes try to launch
the same job at the same time, only one succeeds. The default isolation level for that
method is SERIALIZABLE, which is quite aggressive. READ_COMMITTED usually works equally
well. READ_UNCOMMITTED is fine if two processes are not likely to collide in this
way. However, since a call to the create* method is quite short, it is unlikely that
SERIALIZED causes problems, as long as the database platform supports it. However, you
can override this setting.
The following example shows how to override the isolation level in XML:
XML Configuration
<job-repository id="jobRepository"
                isolation-level-for-create="REPEATABLE_READ" />
The following example shows how to override the isolation level in Java:
Java Configuration
@Configuration
@EnableBatchProcessing(isolationLevelForCreate = "ISOLATION_REPEATABLE_READ")
public class MyJobConfiguration {
   // job definition
If the namespace is not used, you must also configure the
transactional behavior of the repository by using AOP.
The following example shows how to configure the transactional behavior of the repository
in XML:
XML Configuration
<aop:config>
    <aop:advisor
           pointcut="execution(* org.springframework.batch.core..*Repository+.*(..))"/>
    <advice-ref="txAdvice" />
</aop:config>
<tx:advice id="txAdvice" transaction-manager="transactionManager">
    <tx:attributes>
        <tx:method name="*" />
    </tx:attributes>
</tx:advice>
You can use the preceding fragment nearly as is, with almost no changes. Remember also to
include the  appropriate namespace declarations and to make sure spring-tx and spring-aop
(or the whole of Spring) are on the classpath.
The following example shows how to configure the transactional behavior of the repository
in Java:
Java Configuration
@Bean
public TransactionProxyFactoryBean baseProxy() {
	TransactionProxyFactoryBean transactionProxyFactoryBean = new TransactionProxyFactoryBean();
	Properties transactionAttributes = new Properties();
	transactionAttributes.setProperty("*", "PROPAGATION_REQUIRED");
	transactionProxyFactoryBean.setTransactionAttributes(transactionAttributes);
	transactionProxyFactoryBean.setTarget(jobRepository());
	transactionProxyFactoryBean.setTransactionManager(transactionManager());
	return transactionProxyFactoryBean;
4.3.2. Changing the Table Prefix

Another modifiable property of the JobRepository is the table prefix of the meta-data
tables. By default, they are all prefaced with BATCH_. BATCH_JOB_EXECUTION and
BATCH_STEP_EXECUTION are two examples. However, there are potential reasons to modify this
prefix. If the schema names need to be prepended to the table names or if more than one
set of metadata tables is needed within the same schema, the table prefix needs to
be changed.
The following example shows how to change the table prefix in XML:
XML Configuration
<job-repository id="jobRepository"
                table-prefix="SYSTEM.TEST_" />
The following example shows how to change the table prefix in Java:
Java Configuration
@Configuration
@EnableBatchProcessing(tablePrefix = "SYSTEM.TEST_")
public class MyJobConfiguration {
   // job definition
4.3.3. Non-standard Database Types in a Repository

If you use a database platform that is not in the list of supported platforms, you
may be able to use one of the supported types, if the SQL variant is close enough. To do
this, you can use the raw JobRepositoryFactoryBean instead of the namespace shortcut and
use it to set the database type to the closest match.
The following example shows how to use JobRepositoryFactoryBean to set the database type
to the closest match in XML:
XML Configuration
<bean id="jobRepository" class="org...JobRepositoryFactoryBean">
    <property name="databaseType" value="db2"/>
    <property name="dataSource" ref="dataSource"/>
</bean>
The following example shows how to use JobRepositoryFactoryBean to set the database type
to the closest match in Java:
Java Configuration
@Bean
public JobRepository jobRepository() throws Exception {
    JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
    factory.setDataSource(dataSource);
    factory.setDatabaseType("db2");
    factory.setTransactionManager(transactionManager);
    return factory.getObject();
If the database type is not specified, the JobRepositoryFactoryBean tries to
auto-detect the database type from the DataSource.
The major differences between platforms are
mainly accounted for by the strategy for incrementing primary keys, so
it is often necessary to override the
incrementerFactory as well (by using one of the standard
implementations from the Spring Framework).
If even that does not work or if you are not using an RDBMS, the
only option may be to implement the various Dao
interfaces that the SimpleJobRepository depends
on and wire one up manually in the normal Spring way.
4.4. Configuring a JobLauncher

When you use @EnableBatchProcessing, a JobRegistry is provided for you.
This section describes how to configure your own.
The most basic implementation of the JobLauncher interface is the TaskExecutorJobLauncher.
Its only required dependency is a JobRepository (needed to obtain an execution).
The following example shows a TaskExecutorJobLauncher in XML:
XML Configuration
<bean id="jobLauncher"
      class="org.springframework.batch.core.launch.support.TaskExecutorJobLauncher">
    <property name="jobRepository" ref="jobRepository" />
</bean>
The following example shows a TaskExecutorJobLauncher in Java:
Java Configuration
@Bean
public JobLauncher jobLauncher() throws Exception {
	TaskExecutorJobLauncher jobLauncher = new TaskExecutorJobLauncher();
	jobLauncher.setJobRepository(jobRepository);
	jobLauncher.afterPropertiesSet();
	return jobLauncher;
Once a JobExecution is obtained, it is passed to the
execute method of Job, ultimately returning the JobExecution to the caller, as
the following image shows:
The sequence is straightforward and works well when launched from a scheduler. However,
issues arise when trying to launch from an HTTP request. In this scenario, the launching
needs to be done asynchronously so that the TaskExecutorJobLauncher returns immediately to its
caller. This is because it is not good practice to keep an HTTP request open for the
amount of time needed by long running processes (such as batch jobs). The following image shows
an example sequence:
You can configure the TaskExecutorJobLauncher to allow for this scenario by configuring a
TaskExecutor.
The following XML example configures a TaskExecutorJobLauncher to return immediately:
XML Configuration
<bean id="jobLauncher"
      class="org.springframework.batch.core.launch.support.TaskExecutorJobLauncher">
    <property name="jobRepository" ref="jobRepository" />
    <property name="taskExecutor">
        <bean class="org.springframework.core.task.SimpleAsyncTaskExecutor" />
    </property>
</bean>
The following Java example configures a TaskExecutorJobLauncher to return immediately:
Java Configuration
@Bean
public JobLauncher jobLauncher() {
	TaskExecutorJobLauncher jobLauncher = new TaskExecutorJobLauncher();
	jobLauncher.setJobRepository(jobRepository());
	jobLauncher.setTaskExecutor(new SimpleAsyncTaskExecutor());
	jobLauncher.afterPropertiesSet();
	return jobLauncher;
You can use any implementation of the spring TaskExecutor
interface to control how jobs are asynchronously
executed.
4.5. Running a Job

At a minimum, launching a batch job requires two things: the
Job to be launched and a
JobLauncher. Both can be contained within the same
context or different contexts. For example, if you launch jobs from the
command line, a new JVM is instantiated for each Job. Thus, every
job has its own JobLauncher. However, if
you run from within a web container that is within the scope of an
HttpRequest, there is usually one
JobLauncher (configured for asynchronous job
launching) that multiple requests invoke to launch their jobs.
4.5.1. Running Jobs from the Command Line

If you want to run your jobs from an enterprise
scheduler, the command line is the primary interface. This is because
most schedulers (with the exception of Quartz, unless using
NativeJob) work directly with operating system
processes, primarily kicked off with shell scripts. There are many ways
to launch a Java process besides a shell script, such as Perl, Ruby, or
even build tools, such as Ant or Maven. However, because most people
are familiar with shell scripts, this example focuses on them.
The CommandLineJobRunner

Because the script launching the job must kick off a Java
Virtual Machine, there needs to be a class with a main method to act
as the primary entry point. Spring Batch provides an implementation
that serves this purpose:
CommandLineJobRunner. Note
that this is just one way to bootstrap your application. There are
many ways to launch a Java process, and this class should in no way be
viewed as definitive. The CommandLineJobRunner
performs four tasks:
All of these tasks are accomplished with only the arguments passed in.
The following table describes the required arguments:
Table 14. CommandLineJobRunner arguments
jobPath
The location of the XML file that is used to
create an ApplicationContext. This file
should contain everything needed to run the complete
Job.
jobName
The name of the job to be run.
These arguments must be passed in, with the path first and the name second. All arguments
after these are considered to be job parameters, are turned into a JobParameters object,
and must be in the format of name=value.
The following example shows a date passed as a job parameter to a job defined in XML:
<bash$ java CommandLineJobRunner endOfDayJob.xml endOfDay schedule.date=2007-05-05,java.time.LocalDate
The following example shows a date passed as a job parameter to a job defined in Java:




    

<bash$ java CommandLineJobRunner io.spring.EndOfDayJobConfiguration endOfDay schedule.date=2007-05-05,java.time.LocalDate
By default, the CommandLineJobRunner uses a DefaultJobParametersConverter that implicitly converts
key/value pairs to identifying job parameters. However, you can explicitly specify
which job parameters are identifying and which are not by suffixing them with true or false, respectively.
In the following example, schedule.date is an identifying job parameter, while vendor.id is not:
<bash$ java CommandLineJobRunner endOfDayJob.xml endOfDay \
                                 schedule.date=2007-05-05,java.time.LocalDate,true \
                                 vendor.id=123,java.lang.Long,false
<bash$ java CommandLineJobRunner io.spring.EndOfDayJobConfiguration endOfDay \
                                 schedule.date=2007-05-05,java.time.LocalDate,true \
                                 vendor.id=123,java.lang.Long,false
You can override this behavior by using a custom JobParametersConverter.
In most cases, you would want to use a manifest to declare your main class in a jar. However,
for simplicity, the class was used directly. This example uses the EndOfDay
example from the The Domain Language of Batch. The first
argument is endOfDayJob.xml, which is the Spring ApplicationContext that contains the
Job. The second argument, endOfDay, represents the job name. The final argument,
schedule.date=2007-05-05,java.time.LocalDate, is converted into a JobParameter object of type
java.time.LocalDate.
The following example shows a sample configuration for endOfDay in XML:
<job id="endOfDay">
    <step id="step1" parent="simpleStep" />
<!-- Launcher details removed for clarity -->
<beans:bean id="jobLauncher"
         class="org.springframework.batch.core.launch.support.TaskExecutorJobLauncher" />
In most cases, you would want to use a manifest to declare your main class in a jar. However,
for simplicity, the class was used directly. This example uses the EndOfDay
example from the The Domain Language of Batch. The first
argument is io.spring.EndOfDayJobConfiguration, which is the fully qualified class name
to the configuration class that contains the Job. The second argument, endOfDay, represents
the job name. The final argument, schedule.date=2007-05-05,java.time.LocalDate, is converted
into a JobParameter object of type java.time.LocalDate.
The following example shows a sample configuration for endOfDay in Java:
@Configuration
@EnableBatchProcessing
public class EndOfDayJobConfiguration {
    @Bean
    public Job endOfDay(JobRepository jobRepository, Step step1) {
        return new JobBuilder("endOfDay", jobRepository)
    				.start(step1)
    				.build();
    @Bean
    public Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
        return new StepBuilder("step1", jobRepository)
    				.tasklet((contribution, chunkContext) -> null, transactionManager)
    				.build();
The preceding example is overly simplistic, since there are many more requirements to a
run a batch job in Spring Batch in general, but it serves to show the two main
requirements of the CommandLineJobRunner: Job and JobLauncher.
Exit Codes

When launching a batch job from the command-line, an enterprise
scheduler is often used. Most schedulers are fairly dumb and work only
at the process level. This means that they only know about some
operating system process (such as a shell script that they invoke).
In this scenario, the only way to communicate back to the scheduler
about the success or failure of a job is through return codes. A
return code is a number that is returned to a scheduler by the process
to indicate the result of the run. In the simplest case, 0 is
success and 1 is failure. However, there may be more complex
scenarios, such as “If job A returns 4, kick off job B, and, if it returns 5, kick
off job C.” This type of behavior is configured at the scheduler level,
but it is important that a processing framework such as Spring Batch
provide a way to return a numeric representation of the exit code
for a particular batch job. In Spring Batch, this is encapsulated
within an ExitStatus, which is covered in more
detail in Chapter 5. For the purposes of discussing exit codes, the
only important thing to know is that an
ExitStatus has an exit code property that is
set by the framework (or the developer) and is returned as part of the
JobExecution returned from the
JobLauncher. The
CommandLineJobRunner converts this string value
to a number by using the ExitCodeMapper
interface:
public interface ExitCodeMapper {
    public int intValue(String exitCode);
The essential contract of an
ExitCodeMapper is that, given a string exit
code, a number representation will be returned. The default
implementation used by the job runner is the SimpleJvmExitCodeMapper
that returns 0 for completion, 1 for generic errors, and 2 for any job
runner errors such as not being able to find a
Job in the provided context. If anything more
complex than the three values above is needed, a custom
implementation of the ExitCodeMapper interface
must be supplied. Because the
CommandLineJobRunner is the class that creates
an ApplicationContext and, thus, cannot be
'wired together', any values that need to be overwritten must be
autowired. This means that if an implementation of
ExitCodeMapper is found within the BeanFactory,
it is injected into the runner after the context is created. All
that needs to be done to provide your own
ExitCodeMapper is to declare the implementation
as a root level bean and ensure that it is part of the
ApplicationContext that is loaded by the
runner.
4.5.2. Running Jobs from within a Web Container

Historically, offline processing (such as batch jobs) has been
launched from the command-line, as described earlier. However, there are
many cases where launching from an HttpRequest is
a better option. Many such use cases include reporting, ad-hoc job
running, and web application support. Because a batch job (by definition)
is long running, the most important concern is to launch the
job asynchronously:
The controller in this case is a Spring MVC controller. See the
Spring Framework Reference Guide for more about Spring MVC.
The controller launches a Job by using a
JobLauncher that has been configured to launch
asynchronously, which
immediately returns a JobExecution. The
Job is likely still running. However, this
nonblocking behavior lets the controller return immediately, which
is required when handling an HttpRequest. The following listing
shows an example:
@Controller
public class JobLauncherController {
    @Autowired
    JobLauncher jobLauncher;
    @Autowired
    Job job;
    @RequestMapping("/jobLauncher.html")
    public void handle() throws Exception{
        jobLauncher.run(job, new JobParameters());
4.6. Advanced Metadata Usage

So far, both the JobLauncher and JobRepository interfaces have been
discussed. Together, they represent the simple launching of a job and basic
CRUD operations of batch domain objects:
JobRepository to create new
JobExecution objects and run them.
Job and Step implementations
later use the same JobRepository for basic updates
of the same executions during the running of a Job.
The basic operations suffice for simple scenarios. However, in a large batch
environment with hundreds of batch jobs and complex scheduling
requirements, more advanced access to the metadata is required:
The JobExplorer and
JobOperator interfaces, which are discussed
in the coming sections, add additional functionality for querying and controlling the metadata.
4.6.1. Querying the Repository

The most basic need before any advanced features is the ability to
query the repository for existing executions. This functionality is
provided by the JobExplorer interface:
public interface JobExplorer {
    List<JobInstance> getJobInstances(String jobName, int start, int count);
    JobExecution getJobExecution(Long executionId);
    StepExecution getStepExecution(Long jobExecutionId, Long stepExecutionId);
    JobInstance getJobInstance(Long instanceId);
    List<JobExecution> getJobExecutions(JobInstance jobInstance);
    Set<JobExecution> findRunningJobExecutions(String jobName);
As is evident from its method signatures, JobExplorer is a read-only version of
the JobRepository, and, like the JobRepository, it can be easily configured by using a
factory bean.
The following example shows how to configure a JobExplorer in XML:
XML Configuration
<bean id="jobExplorer" class="org.spr...JobExplorerFactoryBean"
      p:dataSource-ref="dataSource" />
The following example shows how to configure a JobExplorer in Java:
Java Configuration
// This would reside in your DefaultBatchConfiguration extension
@Bean
public JobExplorer jobExplorer() throws Exception {
	JobExplorerFactoryBean factoryBean = new JobExplorerFactoryBean();
	factoryBean.setDataSource(this.dataSource);
	return factoryBean.getObject();
Earlier in this chapter, we noted that you can modify the table prefix
of the JobRepository to allow for different versions or schemas. Because
the JobExplorer works with the same tables, it also needs the ability to set a prefix.
The following example shows how to set the table prefix for a JobExplorer in XML:
XML Configuration
<bean id="jobExplorer" class="org.spr...JobExplorerFactoryBean"
		p:tablePrefix="SYSTEM."/>
The following example shows how to set the table prefix for a JobExplorer in Java:
Java Configuration
// This would reside in your DefaultBatchConfiguration extension
@Bean
public JobExplorer jobExplorer() throws Exception {
	JobExplorerFactoryBean factoryBean = new JobExplorerFactoryBean();
	factoryBean.setDataSource(this.dataSource);
	factoryBean.setTablePrefix("SYSTEM.");
	return factoryBean.getObject();
4.6.2. JobRegistry

A JobRegistry (and its parent interface, JobLocator) is not mandatory, but it can be
useful if you want to keep track of which jobs are available in the context. It is also
useful for collecting jobs centrally in an application context when they have been created
elsewhere (for example, in child contexts). You can also use custom JobRegistry implementations
to manipulate the names and other properties of the jobs that are registered.
There is only one implementation provided by the framework and this is based on a simple
map from job name to job instance.
The following example shows how to include a JobRegistry for a job defined in XML:
<bean id="jobRegistry" class="org.springframework.batch.core.configuration.support.MapJobRegistry" />
When using @EnableBatchProcessing, a JobRegistry is provided for you.
The following example shows how to configure your own JobRegistry:
// This is already provided via the @EnableBatchProcessing but can be customized via
// overriding the bean in the DefaultBatchConfiguration
@Override
@Bean
public JobRegistry jobRegistry() throws Exception {
	return new MapJobRegistry();
You can populate a JobRegistry in either of two ways: by using
a bean post processor or by using a registrar lifecycle component. The coming
sections describe these two mechanisms.
JobRegistryBeanPostProcessor

This is a bean post-processor that can register all jobs as they are created.
The following example shows how to include the JobRegistryBeanPostProcessor for a job
defined in XML:
XML Configuration
<bean id="jobRegistryBeanPostProcessor" class="org.spr...JobRegistryBeanPostProcessor">
    <property name="jobRegistry" ref="jobRegistry"/>
</bean>
The following example shows how to include the JobRegistryBeanPostProcessor for a job
defined in Java:
Java Configuration
@Bean
public JobRegistryBeanPostProcessor jobRegistryBeanPostProcessor(JobRegistry jobRegistry) {
    JobRegistryBeanPostProcessor postProcessor = new JobRegistryBeanPostProcessor();
    postProcessor.setJobRegistry(jobRegistry);
    return postProcessor;
Although it is not strictly necessary, the post-processor in the
example has been given an id so that it can be included in child
contexts (for example, as a parent bean definition) and cause all jobs created
there to also be registered automatically.
AutomaticJobRegistrar

This is a lifecycle component that creates child contexts and registers jobs from those
contexts as they are created. One advantage of doing this is that, while the job names in
the child contexts still have to be globally unique in the registry, their dependencies
can have “natural” names. So, for example, you can create a set of XML configuration files
that each have only one Job but that all have different definitions of an ItemReader with the
same bean name, such as reader. If all those files were imported into the same context,
the reader definitions would clash and override one another, but, with the automatic
registrar, this is avoided. This makes it easier to integrate jobs that have been contributed from
separate modules of an application.
The following example shows how to include the AutomaticJobRegistrar for a job defined
in XML:
XML Configuration
<bean class="org.spr...AutomaticJobRegistrar">
   <property name="applicationContextFactories">
      <bean class="org.spr...ClasspathXmlApplicationContextsFactoryBean">
         <property name="resources" value="classpath*:/config/job*.xml" />
      </bean>
   </property>
   <property name="jobLoader">
      <bean class="org.spr...DefaultJobLoader">
         <property name="jobRegistry" ref="jobRegistry" />
      </bean>
   </property>
</bean>
The following example shows how to include the AutomaticJobRegistrar for a job defined
in Java:
Java Configuration
@Bean
public AutomaticJobRegistrar registrar() {
    AutomaticJobRegistrar registrar = new AutomaticJobRegistrar();
    registrar.setJobLoader(jobLoader());
    registrar.setApplicationContextFactories(applicationContextFactories());
    registrar.afterPropertiesSet();
    return registrar;
The registrar has two mandatory properties: an array of
ApplicationContextFactory (created from a
convenient factory bean in the preceding example) and a
JobLoader. The JobLoader
is responsible for managing the lifecycle of the child contexts and
registering jobs in the JobRegistry.
The ApplicationContextFactory is
responsible for creating the child context. The most common usage
is (as in the preceding example) to use a
ClassPathXmlApplicationContextFactory. One of
the features of this factory is that, by default, it copies some of the
configuration down from the parent context to the child. So, for
instance, you need not redefine the
PropertyPlaceholderConfigurer or AOP
configuration in the child, provided it should be the same as the
parent.




    

You can use AutomaticJobRegistrar in
conjunction with a JobRegistryBeanPostProcessor
(as long as you also use DefaultJobLoader).
For instance, this might be desirable if there are jobs
defined in the main parent context as well as in the child
locations.
As previously discussed, the JobRepository
provides CRUD operations on the meta-data, and the
JobExplorer provides read-only operations on the
metadata. However, those operations are most useful when used together
to perform common monitoring tasks such as stopping, restarting, or
summarizing a Job, as is commonly done by batch operators. Spring Batch
provides these types of operations in the
JobOperator interface:
public interface JobOperator {
    List<Long> getExecutions(long instanceId) throws NoSuchJobInstanceException;
    List<Long> getJobInstances(String jobName, int start, int count)
          throws NoSuchJobException;
    Set<Long> getRunningExecutions(String jobName) throws NoSuchJobException;
    String getParameters(long executionId) throws NoSuchJobExecutionException;
    Long start(String jobName, String parameters)
          throws NoSuchJobException, JobInstanceAlreadyExistsException;
    Long restart(long executionId)
          throws JobInstanceAlreadyCompleteException, NoSuchJobExecutionException,
                  NoSuchJobException, JobRestartException;
    Long startNextInstance(String jobName)
          throws NoSuchJobException, JobParametersNotFoundException, JobRestartException,
                 JobExecutionAlreadyRunningException, JobInstanceAlreadyCompleteException;
    boolean stop(long executionId)
          throws NoSuchJobExecutionException, JobExecutionNotRunningException;
    String getSummary(long executionId) throws NoSuchJobExecutionException;
    Map<Long, String> getStepExecutionSummaries(long executionId)
          throws NoSuchJobExecutionException;
    Set<String> getJobNames();
The preceding operations represent methods from many different interfaces, such as
JobLauncher, JobRepository, JobExplorer, and JobRegistry. For this reason, the
provided implementation of JobOperator (SimpleJobOperator) has many dependencies.
The following example shows a typical bean definition for SimpleJobOperator in XML:
<bean id="jobOperator" class="org.spr...SimpleJobOperator">
    <property name="jobExplorer">
        <bean class="org.spr...JobExplorerFactoryBean">
            <property name="dataSource" ref="dataSource" />
        </bean>
    </property>
    <property name="jobRepository" ref="jobRepository" />
    <property name="jobRegistry" ref="jobRegistry" />
    <property name="jobLauncher" ref="jobLauncher" />
</bean>
The following example shows a typical bean definition for SimpleJobOperator in Java:
  * All injected dependencies for this bean are provided by the @EnableBatchProcessing
  * infrastructure out of the box.
 @Bean
 public SimpleJobOperator jobOperator(JobExplorer jobExplorer,
                                JobRepository jobRepository,
                                JobRegistry jobRegistry,
                                JobLauncher jobLauncher) {
	SimpleJobOperator jobOperator = new SimpleJobOperator();
	jobOperator.setJobExplorer(jobExplorer);
	jobOperator.setJobRepository(jobRepository);
	jobOperator.setJobRegistry(jobRegistry);
	jobOperator.setJobLauncher(jobLauncher);
	return jobOperator;
As of version 5.0, the @EnableBatchProcessing annotation automatically registers a job operator bean
in the application context.
Most of the methods on JobOperator are
self-explanatory, and you can find more detailed explanations in the
Javadoc of the interface. However, the
startNextInstance method is worth noting. This
method always starts a new instance of a Job.
This can be extremely useful if there are serious issues in a
JobExecution and the Job
needs to be started over again from the beginning. Unlike
JobLauncher (which requires a new
JobParameters object that triggers a new
JobInstance), if the parameters are different from
any previous set of parameters, the
startNextInstance method uses the
JobParametersIncrementer tied to the
Job to force the Job to a
new instance:
public interface JobParametersIncrementer {
    JobParameters getNext(JobParameters parameters);
The contract of JobParametersIncrementer is
that, given a JobParameters
object, it returns the “next” JobParameters
object by incrementing any necessary values it may contain. This
strategy is useful because the framework has no way of knowing what
changes to the JobParameters make it the “next”
instance. For example, if the only value in
JobParameters is a date and the next instance
should be created, should that value be incremented by one day or one
week (if the job is weekly, for instance)? The same can be said for any
numerical values that help to identify the Job,
as the following example shows:
public class SampleIncrementer implements JobParametersIncrementer {
    public JobParameters getNext(JobParameters parameters) {
        if (parameters==null || parameters.isEmpty()) {
            return new JobParametersBuilder().addLong("run.id", 1L).toJobParameters();
        long id = parameters.getLong("run.id",1L) + 1;
        return new JobParametersBuilder().addLong("run.id", id).toJobParameters();
In this example, the value with a key of run.id is used to
discriminate between JobInstances. If the
JobParameters passed in is null, it can be
assumed that the Job has never been run before
and, thus, its initial state can be returned. However, if not, the old
value is obtained, incremented by one, and returned.
For jobs defined in XML, you can associate an incrementer with a Job through the
incrementer attribute in the namespace, as follows:
<job id="footballJob" incrementer="sampleIncrementer">
For jobs defined in Java, you can associate an incrementer with a Job through the
incrementer method provided in the builders, as follows:
@Bean
public Job footballJob(JobRepository jobRepository) {
    return new JobBuilder("footballJob", jobRepository)
    				 .incrementer(sampleIncrementer())
                     .build();
The shutdown is not immediate, since there is no way to force
immediate shutdown, especially if the execution is currently in
developer code that the framework has no control over, such as a
business service. However, as soon as control is returned back to the
framework, it sets the status of the current
StepExecution to
BatchStatus.STOPPED, saves it, and does the same
for the JobExecution before finishing.
4.6.6. Aborting a Job

A job execution that is FAILED can be
restarted (if the Job is restartable). A job execution whose status is
ABANDONED cannot be restarted by the framework.
The ABANDONED status is also used in step
executions to mark them as skippable in a restarted job execution. If a
job is running and encounters a step that has been marked
ABANDONED in the previous failed job execution, it
moves on to the next step (as determined by the job flow definition
and the step execution exit status).
If the process died (kill -9 or server
failure), the job is, of course, not running, but the JobRepository has
no way of knowing because no one told it before the process died. You
have to tell it manually that you know that the execution either failed
or should be considered aborted (change its status to
FAILED or ABANDONED). This is
a business decision, and there is no way to automate it. Change the
status to FAILED only if it is restartable and you know that the restart data is valid.
As discussed in the domain chapter, a Step is a
domain object that encapsulates an independent, sequential phase of a batch job and
contains all of the information necessary to define and control the actual batch
processing. This is a necessarily vague description because the contents of any given
Step are at the discretion of the developer writing a Job. A Step can be as simple
or complex as the developer desires. A simple Step might load data from a file into the
database, requiring little or no code (depending upon the implementations used). A more
complex Step might have complicated business rules that are applied as part of the
processing, as the following image shows:
5.1. Chunk-oriented Processing

Spring Batch uses a “chunk-oriented” processing style in its most common
implementation. Chunk oriented processing refers to reading the data one at a time and
creating 'chunks' that are written out within a transaction boundary. Once the number of
items read equals the commit interval, the entire chunk is written out by the
ItemWriter, and then the transaction is committed. The following image shows the
process:
List items = new Arraylist();
for(int i = 0; i < commitInterval; i++){
    Object item = itemReader.read();
    if (item != null) {
        items.add(item);
itemWriter.write(items);
You can also configure a chunk-oriented step with an optional ItemProcessor
to process items before passing them to the ItemWriter. The following image
shows the process when an ItemProcessor is registered in the step:
List items = new Arraylist();
for(int i = 0; i < commitInterval; i++){
    Object item = itemReader.read();
    if (item != null) {
        items.add(item);
List processedItems = new Arraylist();
for(Object item: items){
    Object processedItem = itemProcessor.process(item);
    if (processedItem != null) {
        processedItems.add(processedItem);
itemWriter.write(processedItems);
For more details about item processors and their use cases, see the
Item processing section.
5.1.1. Configuring a Step

Despite the relatively short list of required dependencies for a Step, it is an
extremely complex class that can potentially contain many collaborators.
To ease configuration, you can use the Spring Batch XML namespace, as
the following example shows:
XML Configuration
<job id="sampleJob" job-repository="jobRepository">
    <step id="step1">
        <tasklet transaction-manager="transactionManager">
            <chunk reader="itemReader" writer="itemWriter" commit-interval="10"/>
        </tasklet>
    </step>
When using Java configuration, you can use the Spring Batch builders, as the
following example shows:
Java Configuration
 * Note the JobRepository is typically autowired in and not needed to be explicitly
 * configured
@Bean
public Job sampleJob(JobRepository jobRepository, Step sampleStep) {
    return new JobBuilder("sampleJob", jobRepository)
                .start(sampleStep)
                .build();
 * Note the TransactionManager is typically autowired in and not needed to be explicitly
 * configured
@Bean
public Step sampleStep(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
	return new StepBuilder("sampleStep", jobRepository)
				.<String, String>chunk(10, transactionManager)
				.reader(itemReader())
				.writer(itemWriter())
				.build();
The preceding configuration includes the only required dependencies to create a item-oriented
step:
job-repository: The XML-specific name of the JobRepository that periodically stores
the StepExecution and ExecutionContext during processing (just before committing). For
an in-line <step/> (one defined within a <job/>), it is an attribute on the <job/>
element. For a standalone <step/>, it is defined as an attribute of the <tasklet/>.
chunk: The Java-specific name of the dependency that indicates that this is an
item-based step and the number of items to be processed before the transaction is
committed.
Note that job-repository defaults to jobRepository and
transaction-manager defaults to transactionManager. Also, the ItemProcessor is
optional, since the item could be directly passed from the reader to the writer.
Note that repository defaults to jobRepository (provided through @EnableBatchProcessing)
and transactionManager defaults to transactionManager (provided from the application context).
Also, the ItemProcessor is optional, since the item could be
directly passed from the reader to the writer.
5.1.2. Inheriting from a Parent Step

If a group of Steps share similar configurations, then it may be helpful to define a
“parent” Step from which the concrete Steps may inherit properties. Similar to class
inheritance in Java, the “child” Step combines its elements and attributes with the
parent’s. The child also overrides any of the parent’s Steps.
In the following example, the Step, concreteStep1, inherits from parentStep. It is
instantiated with itemReader, itemProcessor, itemWriter, startLimit=5, and
allowStartIfComplete=true. Additionally, the commitInterval is 5, since it is
overridden by the concreteStep1 Step, as the following example shows:
<step id="parentStep">
    <tasklet allow-start-if-complete="true">
        <chunk reader="itemReader" writer="itemWriter" commit-interval="10"/>
    </tasklet>
</step>
<step id="concreteStep1" parent="parentStep">
    <tasklet start-limit="5">
        <chunk processor="itemProcessor" commit-interval="5"/>
    </tasklet>
</step>
The id attribute is still required on the step within the job element. This is for two
reasons:
Abstract Step

Sometimes, it may be necessary to define a parent Step that is not a complete Step
configuration. If, for instance, the reader, writer, and tasklet attributes are
left off of a Step configuration, then initialization fails. If a parent must be
defined without one or more of these properties, the abstract attribute should be used. An
abstract Step is only extended, never instantiated.
In the following example, the Step (abstractParentStep) would not be instantiated if it
were not declared to be abstract. The Step, (concreteStep2) has itemReader,
itemWriter, and commit-interval=10.
<step id="abstractParentStep" abstract="true">
    <tasklet>
        <chunk commit-interval="10"/>
    </tasklet>
</step>
<step id="concreteStep2" parent="abstractParentStep">
    <tasklet>
        <chunk reader="itemReader" writer="itemWriter"/>
    </tasklet>
</step>
Merging Lists

Some of the configurable elements on Steps are lists, such as the <listeners/> element.
If both the parent and child Steps declare a <listeners/> element, the
child’s list overrides the parent’s. To allow a child to add additional
listeners to the list defined by the parent, every list element has a merge attribute.
If the element specifies that merge="true", then the child’s list is combined with the
parent’s instead of overriding it.




    

In the following example, the Step "concreteStep3", is created with two listeners:
listenerOne and listenerTwo:
<step id="listenersParentStep" abstract="true">
    <listeners>
        <listener ref="listenerOne"/>
    <listeners>
</step>
<step id="concreteStep3" parent="listenersParentStep">
    <tasklet>
        <chunk reader="itemReader" writer="itemWriter" commit-interval="5"/>
    </tasklet>
    <listeners merge="true">
        <listener ref="listenerTwo"/>
    <listeners>
</step>
5.1.3. The Commit Interval

As mentioned previously, a step reads in and writes out items, periodically committing
by using the supplied PlatformTransactionManager. With a commit-interval of 1, it
commits after writing each individual item. This is less than ideal in many situations,
since beginning and committing a transaction is expensive. Ideally, it is preferable to
process as many items as possible in each transaction, which is completely dependent upon
the type of data being processed and the resources with which the step is interacting.
For this reason, you can configure the number of items that are processed within a commit.
The following example shows a step whose tasklet has a commit-interval
value of 10 as it would be defined in XML:
XML Configuration
<job id="sampleJob">
    <step id="step1">
        <tasklet>
            <chunk reader="itemReader" writer="itemWriter" commit-interval="10"/>
        </tasklet>
    </step>
The following example shows a step whose tasklet has a commit-interval
value of 10 as it would be defined in Java:
Java Configuration
@Bean
public Job sampleJob(JobRepository jobRepository) {
    return new JobBuilder("sampleJob", jobRepository)
                     .start(step1())
                     .build();
@Bean
public Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
	return new StepBuilder("step1", jobRepository)
				.<String, String>chunk(10, transactionManager)
				.reader(itemReader())
				.writer(itemWriter())
				.build();
In the preceding example, 10 items are processed within each transaction. At the
beginning of processing, a transaction is begun. Also, each time read is called on the
ItemReader, a counter is incremented. When it reaches 10, the list of aggregated items
is passed to the ItemWriter, and the transaction is committed.
5.1.4. Configuring a Step for Restart

In the “Configuring and Running a Job” section , restarting a
Job was discussed. Restart has numerous impacts on steps, and, consequently, may
require some specific configuration.
Setting a Start Limit

There are many scenarios where you may want to control the number of times a Step can
be started. For example, you might need to configure a particular Step might so that it
runs only once because it invalidates some resource that must be fixed manually before it can
be run again. This is configurable on the step level, since different steps may have
different requirements. A Step that can be executed only once can exist as part of the
same Job as a Step that can be run infinitely.
The following code fragment shows an example of a start limit configuration in XML:
XML Configuration
<step id="step1">
    <tasklet start-limit="1">
        <chunk reader="itemReader" writer="itemWriter" commit-interval="10"/>
    </tasklet>
</step>
The following code fragment shows an example of a start limit configuration in Java:
Java Configuration
@Bean
public Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
	return new StepBuilder("step1", jobRepository)
				.<String, String>chunk(10, transactionManager)
				.reader(itemReader())
				.writer(itemWriter())
				.startLimit(1)
				.build();
The step shown in the preceding example can be run only once. Attempting to run it again
causes a StartLimitExceededException to be thrown. Note that the default value for the
start-limit is Integer.MAX_VALUE.
Restarting a Completed Step

In the case of a restartable job, there may be one or more steps that should always be
run, regardless of whether or not they were successful the first time. An example might
be a validation step or a Step that cleans up resources before processing. During
normal processing of a restarted job, any step with a status of COMPLETED (meaning it
has already been completed successfully), is skipped. Setting allow-start-if-complete to
true overrides this so that the step always runs.
The following code fragment shows how to define a restartable job in XML:
XML Configuration
<step id="step1">
    <tasklet allow-start-if-complete="true">
        <chunk reader="itemReader" writer="itemWriter" commit-interval="10"/>
    </tasklet>
</step>
The following code fragment shows how to define a restartable job in Java:
Java Configuration
@Bean
public Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
	return new StepBuilder("step1", jobRepository)
				.<String, String>chunk(10, transactionManager)
				.reader(itemReader())
				.writer(itemWriter())
				.allowStartIfComplete(true)
				.build();
Step Restart Configuration Example

The following XML example shows how to configure a job to have steps that can be
restarted:
XML Configuration
<job id="footballJob" restartable="true">
    <step id="playerload" next="gameLoad">
        <tasklet>
            <chunk reader="playerFileItemReader" writer="playerWriter"
                   commit-interval="10" />
        </tasklet>
    </step>
    <step id="gameLoad" next="playerSummarization">
        <tasklet allow-start-if-complete="true">
            <chunk reader="gameFileItemReader" writer="gameWriter"
                   commit-interval="10"/>
        </tasklet>
    </step>
    <step id="playerSummarization">
        <tasklet start-limit="2">
            <chunk reader="playerSummarizationSource" writer="summaryWriter"
                   commit-interval="10"/>
        </tasklet>
    </step>
The following Java example shows how to configure a job to have steps that can be
restarted:
Java Configuration
@Bean
public Job footballJob(JobRepository jobRepository) {
	return new JobBuilder("footballJob", jobRepository)
				.start(playerLoad())
				.next(gameLoad())
				.next(playerSummarization())
				.build();
@Bean
public Step playerLoad(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
	return new StepBuilder("playerLoad", jobRepository)
			.<String, String>chunk(10, transactionManager)
			.reader(playerFileItemReader())
			.writer(playerWriter())
			.build();
@Bean
public Step gameLoad(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
	return new StepBuilder("gameLoad", jobRepository)
			.allowStartIfComplete(true)
			.<String, String>chunk(10, transactionManager)
			.reader(gameFileItemReader())
			.writer(gameWriter())
			.build();
@Bean
public Step playerSummarization(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
	return new StepBuilder("playerSummarization", jobRepository)
			.startLimit(2)
			.<String, String>chunk(10, transactionManager)
			.reader(playerSummarizationSource())
			.writer(summaryWriter())
			.build();
The preceding example configuration is for a job that loads in information about football
games and summarizes them. It contains three steps: playerLoad, gameLoad, and
playerSummarization. The playerLoad step loads player information from a flat file,
while the gameLoad step does the same for games. The final step,
playerSummarization, then summarizes the statistics for each player, based upon the
provided games. It is assumed that the file loaded by playerLoad must be loaded only
once but that gameLoad can load any games found within a particular directory,
deleting them after they have been successfully loaded into the database. As a result,
the playerLoad step contains no additional configuration. It can be started any number
of times is skipped if complete. The gameLoad step, however, needs to be run
every time in case extra files have been added since it last ran. It has
allow-start-if-complete set to true to always be started. (It is assumed
that the database table that games are loaded into has a process indicator on it, to ensure
new games can be properly found by the summarization step). The summarization step,
which is the most important in the job, is configured to have a start limit of 2. This
is useful because, if the step continually fails, a new exit code is returned to the
operators that control job execution, and it can not start again until manual
intervention has taken place.
The remainder of this section describes what happens for each of the three runs of the
footballJob example.
Run 1:
gameLoad runs and processes 11 files worth of game data, loading their contents
into the GAMES table.
playerSummarization begins processing and fails after 5 minutes.
playerLoad does not run, since it has already completed successfully, and
allow-start-if-complete is false (the default).
gameLoad runs again and processes another 2 files, loading their contents into the
GAMES table as well (with a process indicator indicating they have yet to be
processed).
playerSummarization begins processing of all remaining game data (filtering using the
process indicator) and fails again after 30 minutes.
playerLoad does not run, since it has already completed successfully, and
allow-start-if-complete is false (the default).
gameLoad runs again and processes another 2 files, loading their contents into the
GAMES table as well (with a process indicator indicating they have yet to be
processed).
playerSummarization is not started and the job is immediately killed, since this is
the third execution of playerSummarization, and its limit is only 2. Either the limit
must be raised or the Job must be executed as a new JobInstance.
5.1.5. Configuring Skip Logic

There are many scenarios where errors encountered while processing should not result in
Step failure but should be skipped instead. This is usually a decision that must be
made by someone who understands the data itself and what meaning it has. Financial data,
for example, may not be skippable because it results in money being transferred, which
needs to be completely accurate. Loading a list of vendors, on the other hand, might
allow for skips. If a vendor is not loaded because it was formatted incorrectly or was
missing necessary information, there probably are not issues. Usually, these bad
records are logged as well, which is covered later when discussing listeners.
The following XML example shows an example of using a skip limit:
XML Configuration
<step id="step1">
   <tasklet>
      <chunk reader="flatFileItemReader" writer="itemWriter"
             commit-interval="10" skip-limit="10">
         <skippable-exception-classes>
            <include class="org.springframework.batch.item.file.FlatFileParseException"/>
         </skippable-exception-classes>
      </chunk>
   </tasklet>
</step>
The following Java example shows an example of using a skip limit:
Java Configuration
@Bean
public Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
	return new StepBuilder("step1", jobRepository)
				.<String, String>chunk(10, transactionManager)
				.reader(flatFileItemReader())
				.writer(itemWriter())
				.faultTolerant()
				.skipLimit(10)
				.skip(FlatFileParseException.class)
				.build();
In the preceding example, a FlatFileItemReader is used. If, at any point, a
FlatFileParseException is thrown, the item is skipped and counted against the total
skip limit of 10. Exceptions (and their subclasses) that are declared might be thrown
during any phase of the chunk processing (read, process, or write). Separate counts
are made of skips on read, process, and write inside
the step execution, but the limit applies across all skips. Once the skip limit is
reached, the next exception found causes the step to fail. In other words, the eleventh
skip triggers the exception, not the tenth.
One problem with the preceding example is that any other exception besides a
FlatFileParseException causes the Job to fail. In certain scenarios, this may be the
correct behavior. However, in other scenarios, it may be easier to identify which
exceptions should cause failure and skip everything else.
The following XML example shows an example excluding a particular exception:
XML Configuration
<step id="step1">
    <tasklet>
        <chunk reader="flatFileItemReader" writer="itemWriter"
               commit-interval="10" skip-limit="10">
            <skippable-exception-classes>
                <include class="java.lang.Exception"/>
                <exclude class="java.io.FileNotFoundException"/>
            </skippable-exception-classes>
        </chunk>
    </tasklet>
</step>
The following Java example shows an example excluding a particular exception:
Java Configuration
@Bean
public Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
	return new StepBuilder("step1", jobRepository)
				.<String, String>chunk(10, transactionManager)
				.reader(flatFileItemReader())
				.writer(itemWriter())
				.faultTolerant()
				.skipLimit(10)
				.skip(Exception.class)
				.noSkip(FileNotFoundException.class)
				.build();
By identifying java.lang.Exception as a skippable exception class, the configuration
indicates that all Exceptions are skippable. However, by “excluding”
java.io.FileNotFoundException, the configuration refines the list of skippable
exception classes to be all Exceptions except FileNotFoundException. Any excluded
exception class is fatal if encountered (that is, they are not skipped).




    

For any exception encountered, the skippability is determined by the nearest superclass
in the class hierarchy. Any unclassified exception is treated as 'fatal'.
The order of the <include/> and <exclude/> elements does not matter.
The order of the skip and noSkip method calls does not matter.
5.1.6. Configuring Retry Logic

In most cases, you want an exception to cause either a skip or a Step failure. However,
not all exceptions are deterministic. If a FlatFileParseException is encountered while
reading, it is always thrown for that record. Resetting the ItemReader does not help.
However, for other exceptions (such as a DeadlockLoserDataAccessException, which
indicates that the current process has attempted to update a record that another process
holds a lock on), waiting and trying again might result in success.
In XML, retry should be configured as follows:
<step id="step1">
   <tasklet>
      <chunk reader="itemReader" writer="itemWriter"
             commit-interval="2" retry-limit="3">
         <retryable-exception-classes>
            <include class="org.springframework.dao.DeadlockLoserDataAccessException"/>
         </retryable-exception-classes>
      </chunk>
   </tasklet>
</step>
In Java, retry should be configured as follows:
@Bean
public Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
	return new StepBuilder("step1", jobRepository)
				.<String, String>chunk(2, transactionManager)
				.reader(itemReader())
				.writer(itemWriter())
				.faultTolerant()
				.retryLimit(3)
				.retry(DeadlockLoserDataAccessException.class)
				.build();
The Step allows a limit for the number of times an individual item can be retried and a
list of exceptions that are “retryable”. You can find more details on how retry works in
retry.
5.1.7. Controlling Rollback

By default, regardless of retry or skip, any exceptions thrown from the ItemWriter
cause the transaction controlled by the Step to rollback. If skip is configured as
described earlier, exceptions thrown from the ItemReader do not cause a rollback.
However, there are many scenarios in which exceptions thrown from the ItemWriter should
not cause a rollback, because no action has taken place to invalidate the transaction.
For this reason, you can configure the Step with a list of exceptions that should not
cause rollback.
In XML, you can control rollback as follows:
XML Configuration
<step id="step1">
   <tasklet>
      <chunk reader="itemReader" writer="itemWriter" commit-interval="2"/>
      <no-rollback-exception-classes>
         <include class="org.springframework.batch.item.validator.ValidationException"/>
      </no-rollback-exception-classes>
   </tasklet>
</step>
In Java, you can control rollback as follows:
Java Configuration
@Bean
public Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
	return new StepBuilder("step1", jobRepository)
				.<String, String>chunk(2, transactionManager)
				.reader(itemReader())
				.writer(itemWriter())
				.faultTolerant()
				.noRollback(ValidationException.class)
				.build();
Transactional Readers

The basic contract of the ItemReader is that it is forward-only. The step buffers
reader input so that, in case of a rollback, the items do not need to be re-read
from the reader. However, there are certain scenarios in which the reader is built on
top of a transactional resource, such as a JMS queue. In this case, since the queue is
tied to the transaction that is rolled back, the messages that have been pulled from the
queue are put back on. For this reason, you can configure the step to not buffer the
items.
The following example shows how to create a reader that does not buffer items in XML:
XML Configuration
<step id="step1">
    <tasklet>
        <chunk reader="itemReader" writer="itemWriter" commit-interval="2"
               is-reader-transactional-queue="true"/>
    </tasklet>
</step>
The following example shows how to create a reader that does not buffer items in Java:
Java Configuration
@Bean
public Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
	return new StepBuilder("step1", jobRepository)
				.<String, String>chunk(2, transactionManager)
				.reader(itemReader())
				.writer(itemWriter())
				.readerIsTransactionalQueue()
				.build();
5.1.8. Transaction Attributes

You can use transaction attributes to control the isolation, propagation, and
timeout settings. You can find more information on setting transaction attributes in
Spring
core documentation.
The following example sets the isolation, propagation, and timeout transaction
attributes in XML:
XML Configuration
<step id="step1">
    <tasklet>
        <chunk reader="itemReader" writer="itemWriter" commit-interval="2"/>
        <transaction-attributes isolation="DEFAULT"
                                propagation="REQUIRED"
                                timeout="30"/>
    </tasklet>
</step>
The following example sets the isolation, propagation, and timeout transaction
attributes in Java:
Java Configuration
@Bean
public Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
	DefaultTransactionAttribute attribute = new DefaultTransactionAttribute();
	attribute.setPropagationBehavior(Propagation.REQUIRED.value());
	attribute.setIsolationLevel(Isolation.DEFAULT.value());
	attribute.setTimeout(30);
	return new StepBuilder("step1", jobRepository)
				.<String, String>chunk(2, transactionManager)
				.reader(itemReader())
				.writer(itemWriter())
				.transactionAttribute(attribute)
				.build();
5.1.9. Registering ItemStream with a Step

The step has to take care of ItemStream callbacks at the necessary points in its
lifecycle. (For more information on the ItemStream interface, see
ItemStream). This is vital if a step fails and might
need to be restarted, because the ItemStream interface is where the step gets the
information it needs about persistent state between executions.
If the ItemReader, ItemProcessor, or ItemWriter itself implements the ItemStream
interface, these are registered automatically. Any other streams need to be
registered separately. This is often the case where indirect dependencies, such as
delegates, are injected into the reader and writer. You can register a stream on the
step through the stream element.
The following example shows how to register a stream on a step in XML:
XML Configuration
<step id="step1">
    <tasklet>
        <chunk reader="itemReader" writer="compositeWriter" commit-interval="2">
            <streams>
                <stream ref="fileItemWriter1"/>
                <stream ref="fileItemWriter2"/>
            </streams>
        </chunk>
    </tasklet>
</step>
<beans:bean id="compositeWriter"
            class="org.springframework.batch.item.support.CompositeItemWriter">
    <beans:property name="delegates">
        <beans:list>
            <beans:ref bean="fileItemWriter1" />
            <beans:ref bean="fileItemWriter2" />
        </beans:list>
    </beans:property>
</beans:bean>
The following example shows how to register a stream on a step in Java:
Java Configuration
@Bean
public Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
	return new StepBuilder("step1", jobRepository)
				.<String, String>chunk(2, transactionManager)
				.reader(itemReader())
				.writer(compositeItemWriter())
				.stream(fileItemWriter1())
				.stream(fileItemWriter2())
				.build();
 * In Spring Batch 4, the CompositeItemWriter implements ItemStream so this isn't
 * necessary, but used for an example.
@Bean
public CompositeItemWriter compositeItemWriter() {
	List<ItemWriter> writers = new ArrayList<>(2);
	writers.add(fileItemWriter1());
	writers.add(fileItemWriter2());
	CompositeItemWriter itemWriter = new CompositeItemWriter();
	itemWriter.setDelegates(writers);
	return itemWriter;
In the preceding example, the CompositeItemWriter is not an ItemStream, but both of its
delegates are. Therefore, both delegate writers must be explicitly registered as streams
for the framework to handle them correctly. The ItemReader does not need to be
explicitly registered as a stream because it is a direct property of the Step. The step
is now restartable, and the state of the reader and writer is correctly persisted in the
event of a failure.
5.1.10. Intercepting Step Execution

Just as with the Job, there are many events during the execution of a Step where a
user may need to perform some functionality. For example, to write out to a flat
file that requires a footer, the ItemWriter needs to be notified when the Step has
been completed so that the footer can be written. This can be accomplished with one of many
Step scoped listeners.
You can apply any class that implements one of the extensions of StepListener (but not that interface
itself, since it is empty) to a step through the listeners element.
The listeners element is valid inside a step, tasklet, or chunk declaration.  We
recommend that you declare the listeners at the level at which its function applies
or, if it is multi-featured (such as StepExecutionListener and ItemReadListener),
declare it at the most granular level where it applies.
The following example shows a listener applied at the chunk level in XML:
XML Configuration
<step id="step1">
    <tasklet>
        <chunk reader="reader" writer="writer" commit-interval="10"/>
        <listeners>
            <listener ref="chunkListener"/>
        </listeners>
    </tasklet>
</step>
The following example shows a listener applied at the chunk level in Java:
Java Configuration
@Bean
public Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
	return new StepBuilder("step1", jobRepository)
				.<String, String>chunk(10, transactionManager)
				.reader(reader())
				.writer(writer())
				.listener(chunkListener())
				.build();
An ItemReader, ItemWriter, or ItemProcessor that itself implements one of the
StepListener interfaces is registered automatically with the Step if using the
namespace <step> element or one of the *StepFactoryBean factories. This only
applies to components directly injected into the Step. If the listener is nested inside
another component, you need to explicitly register it (as described previously under
Registering ItemStream with a Step).
In addition to the StepListener interfaces, annotations are provided to address the
same concerns. Plain old Java objects can have methods with these annotations that are
then converted into the corresponding StepListener type. It is also common to annotate
custom implementations of chunk components, such as ItemReader or ItemWriter or
Tasklet. The annotations are analyzed by the XML parser for the <listener/> elements
as well as registered with the listener methods in the builders, so all you need to do
is use the XML namespace or builders to register the listeners with a step.
StepExecutionListener

StepExecutionListener represents the most generic listener for Step execution. It
allows for notification before a Step is started and after it ends, whether it ended
normally or failed, as the following example shows:
public interface StepExecutionListener extends StepListener {
    void beforeStep(StepExecution stepExecution);
    ExitStatus afterStep(StepExecution stepExecution);
ExitStatus has a return type of afterStep, to give listeners the chance to
modify the exit code that is returned upon completion of a Step.
The annotations corresponding to this interface are:
ChunkListener

A “chunk” is defined as the items processed within the scope of a transaction. Committing a
transaction, at each commit interval, commits a chunk. You can use a ChunkListener to
perform logic before a chunk begins processing or after a chunk has completed
successfully, as the following interface definition shows:
public interface ChunkListener extends StepListener {
    void beforeChunk(ChunkContext context);
    void afterChunk(ChunkContext context);
    void afterChunkError(ChunkContext context);
The beforeChunk method is called after the transaction is started but before reading begins
on the ItemReader. Conversely, afterChunk is called after the chunk has been
committed (or not at all if there is a rollback).
The annotations corresponding to this interface are:
You can apply a ChunkListener when there is no chunk declaration. The TaskletStep is
responsible for calling the ChunkListener, so it applies to a non-item-oriented tasklet
as well (it is called before and after the tasklet).
ItemReadListener

When discussing skip logic previously, it was mentioned that it may be beneficial to log
the skipped records so that they can be dealt with later. In the case of read errors,
this can be done with an ItemReaderListener, as the following interface
definition shows:
public interface ItemReadListener<T> extends StepListener {
    void beforeRead();
    void afterRead(T item);
    void onReadError(Exception ex);
The beforeRead method is called before each call to read on the ItemReader. The
afterRead method is called after each successful call to read and is passed the item
that was read. If there was an error while reading, the onReadError method is called.
The exception encountered is provided so that it can be logged.




    

The annotations corresponding to this interface are:
ItemProcessListener

As with the ItemReadListener, the processing of an item can be “listened” to, as
the following interface definition shows:
public interface ItemProcessListener<T, S> extends StepListener {
    void beforeProcess(T item);
    void afterProcess(T item, S result);
    void onProcessError(T item, Exception e);
The beforeProcess method is called before process on the ItemProcessor and is
handed the item that is to be processed. The afterProcess method is called after the
item has been successfully processed. If there was an error while processing, the
onProcessError method is called. The exception encountered and the item that was
attempted to be processed are provided, so that they can be logged.
The annotations corresponding to this interface are:
ItemWriteListener

You can “listen” to the writing of an item with the ItemWriteListener, as the
following interface definition shows:
public interface ItemWriteListener<S> extends StepListener {
    void beforeWrite(List<? extends S> items);
    void afterWrite(List<? extends S> items);
    void onWriteError(Exception exception, List<? extends S> items);
The beforeWrite method is called before write on the ItemWriter and is handed the
list of items that is written. The afterWrite method is called after the item has been
successfully written. If there was an error while writing, the onWriteError method is
called. The exception encountered and the item that was attempted to be written are
provided, so that they can be logged.
The annotations corresponding to this interface are:
SkipListener

ItemReadListener, ItemProcessListener, and ItemWriteListener all provide mechanisms
for being notified of errors, but none informs you that a record has actually been
skipped. onWriteError, for example, is called even if an item is retried and
successful. For this reason, there is a separate interface for tracking skipped items, as
the following interface definition shows:
public interface SkipListener<T,S> extends StepListener {
    void onSkipInRead(Throwable t);
    void onSkipInProcess(T item, Throwable t);
    void onSkipInWrite(S item, Throwable t);
onSkipInRead is called whenever an item is skipped while reading. It should be noted
that rollbacks may cause the same item to be registered as skipped more than once.
onSkipInWrite is called when an item is skipped while writing. Because the item has
been read successfully (and not skipped), it is also provided the item itself as an
argument.
The annotations corresponding to this interface are:
SkipListeners and Transactions

One of the most common use cases for a SkipListener is to log out a skipped item, so
that another batch process or even human process can be used to evaluate and fix the
issue that leads to the skip. Because there are many cases in which the original transaction
may be rolled back, Spring Batch makes two guarantees:
The appropriate skip method (depending on when the error happened) is called only once
per item.
The SkipListener is always called just before the transaction is committed. This is
to ensure that any transactional resources call by the listener are not rolled back by a
failure within the ItemWriter.
Chunk-oriented processing is not the only way to process in a
Step. What if a Step must consist of a stored procedure call? You could
implement the call as an ItemReader and return null after the procedure finishes.
However, doing so is a bit unnatural, since there would need to be a no-op ItemWriter.
Spring Batch provides the TaskletStep for this scenario.
The Tasklet interface has one method, execute, which is called
repeatedly by the TaskletStep until it either returns RepeatStatus.FINISHED or throws
an exception to signal a failure. Each call to a Tasklet is wrapped in a transaction.
Tasklet implementors might call a stored procedure, a script, or a SQL update
statement.
To create a TaskletStep in XML, the ref attribute of the <tasklet/> element should
reference a bean that defines a Tasklet object. No <chunk/> element should be used
within the <tasklet/>. The following example shows a simple tasklet:
<step id="step1">
    <tasklet ref="myTasklet"/>
</step>
To create a TaskletStep in Java, the bean passed to the tasklet method of the builder
should implement the Tasklet interface.  No call to chunk should be called when
building a TaskletStep. The following example shows a simple tasklet:
@Bean
public Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
    return new StepBuilder("step1", jobRepository)
    			.tasklet(myTasklet(), transactionManager)
    			.build();
5.2.1. TaskletAdapter

As with other adapters for the ItemReader and ItemWriter interfaces, the Tasklet
interface contains an implementation that allows for adapting itself to any pre-existing
class: TaskletAdapter. An example where this may be useful is an existing DAO that is
used to update a flag on a set of records. You can use the TaskletAdapter to call this
class without having to write an adapter for the Tasklet interface.
The following example shows how to define a TaskletAdapter in XML:
XML Configuration
<bean id="myTasklet" class="o.s.b.core.step.tasklet.MethodInvokingTaskletAdapter">
    <property name="targetObject">
        <bean class="org.mycompany.FooDao"/>
    </property>
    <property name="targetMethod" value="updateFoo" />
</bean>
The following example shows how to define a TaskletAdapter in Java:
Java Configuration
@Bean
public MethodInvokingTaskletAdapter myTasklet() {
	MethodInvokingTaskletAdapter adapter = new MethodInvokingTaskletAdapter();
	adapter.setTargetObject(fooDao());
	adapter.setTargetMethod("updateFoo");
	return adapter;
5.2.2. Example Tasklet Implementation

Many batch jobs contain steps that must be done before the main processing begins,
to set up various resources or after processing has completed to cleanup those
resources. In the case of a job that works heavily with files, it is often necessary to
delete certain files locally after they have been uploaded successfully to another
location. The following example (taken from the
Spring
Batch samples project) is a Tasklet implementation with just such a responsibility:
public class FileDeletingTasklet implements Tasklet, InitializingBean {
    private Resource directory;
    public RepeatStatus execute(StepContribution contribution,
                                ChunkContext chunkContext) throws Exception {
        File dir = directory.getFile();
        Assert.state(dir.isDirectory());
        File[] files = dir.listFiles();
        for (int i = 0; i < files.length; i++) {
            boolean deleted = files[i].delete();
            if (!deleted) {
                throw new UnexpectedJobExecutionException("Could not delete file " +
                                                          files[i].getPath());
        return RepeatStatus.FINISHED;
    public void setDirectoryResource(Resource directory) {
        this.directory = directory;
    public void afterPropertiesSet() throws Exception {
        Assert.state(directory != null, "directory must be set");
The preceding tasklet implementation deletes all files within a given directory. It
should be noted that the execute method is called only once. All that is left is to
reference the tasklet from the step.
The following example shows how to reference the tasklet from the step in XML:
XML Configuration
<job id="taskletJob">
    <step id="deleteFilesInDir">
       <tasklet ref="fileDeletingTasklet"/>
    </step>
<beans:bean id="fileDeletingTasklet"
            class="org.springframework.batch.sample.tasklet.FileDeletingTasklet">
    <beans:property name="directoryResource">
        <beans:bean id="directory"
                    class="org.springframework.core.io.FileSystemResource">
            <beans:constructor-arg value="target/test-outputs/test-dir" />
        </beans:bean>
    </beans:property>
</beans:bean>
The following example shows how to reference the tasklet from the step in Java:
Java Configuration
@Bean
public Job taskletJob(JobRepository jobRepository) {
	return new JobBuilder("taskletJob", jobRepository)
				.start(deleteFilesInDir())
				.build();
@Bean
public Step deleteFilesInDir(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
	return new StepBuilder("deleteFilesInDir", jobRepository)
				.tasklet(fileDeletingTasklet(), transactionManager)
				.build();
@Bean
public FileDeletingTasklet fileDeletingTasklet() {
	FileDeletingTasklet tasklet = new FileDeletingTasklet();
	tasklet.setDirectoryResource(new FileSystemResource("target/test-outputs/test-dir"));
	return tasklet;
5.3. Controlling Step Flow

With the ability to group steps together within an owning job comes the need to be able
to control how the job “flows” from one step to another. The failure of a Step does not
necessarily mean that the Job should fail. Furthermore, there may be more than one type
of “success” that determines which Step should be executed next. Depending upon how a
group of Steps is configured, certain steps may not even be processed at all.
5.3.1. Sequential Flow

The simplest flow scenario is a job where all of the steps execute sequentially, as
the following image shows:
<job id="job">
    <step id="stepA" parent="s1" next="stepB" />
    <step id="stepB" parent="s2" next="stepC"/>
    <step id="stepC" parent="s3" />
The following example shows how to use the next() method in Java:
Java Configuration
@Bean
public Job job(JobRepository jobRepository) {
	return new JobBuilder("job", jobRepository)
				.start(stepA())
				.next(stepB())
				.next(stepC())
				.build();
In the scenario above, stepA runs first because it is the first Step listed. If
stepA completes normally, stepB runs, and so on. However, if step A fails,
the entire Job fails and stepB does not execute.
With the Spring Batch XML namespace, the first step listed in the configuration is
always the first step run by the Job. The order of the other step elements does not
matter, but the first step must always appear first in the XML.
In many cases, this may be sufficient. However, what about a scenario in which the
failure of a step should trigger a different step, rather than causing failure? The
following image shows such a flow:
To handle more complex scenarios, the Spring Batch XML namespace lets you define transitions
elements within the step element. One such transition is the next
element. Like the next attribute, the next element tells the Job which Step to
execute next. However, unlike the attribute, any number of next elements are allowed on
a given Step, and there is no default behavior in the case of failure. This means that, if
transition elements are used, all of the behavior for the Step transitions must be
defined explicitly. Note also that a single step cannot have both a next attribute and
a transition element.
The next element specifies a pattern to match and the step to execute next, as
the following example shows:
XML Configuration
<job id="job">
    <step id="stepA" parent="s1">
        <next on="*" to="stepB" />
        <next on="FAILED" to="stepC" />
    </step>
    <step id="stepB" parent="s2" next="stepC" />
    <step id="stepC" parent="s3" />
The Java API offers a fluent set of methods that let you specify the flow and what to do
when a step fails. The following example shows how to specify one step (stepA) and then
proceed to either of two different steps (stepB or stepC), depending on whether
stepA succeeds:
Java Configuration
@Bean
public Job job(JobRepository jobRepository) {
	return new JobBuilder("job", jobRepository)
				.start(stepA())
				.on("*").to(stepB())
				.from(stepA()).on("FAILED").to(stepC())
				.end()
				.build();
When using XML configuration, the on attribute of a transition element uses a simple
pattern-matching scheme to match the ExitStatus that results from the execution of the
Step.
When using java configuration, the on() method uses a simple pattern-matching scheme to
match the ExitStatus that results from the execution of the Step.
Only two special characters are allowed in the pattern:
While there is no limit to the number of transition elements on a Step, if the Step
execution results in an ExitStatus that is not covered by an element, the
framework throws an exception and the Job fails. The framework automatically orders
transitions from most specific to least specific. This means that, even if the ordering
were swapped for stepA in the preceding example, an ExitStatus of FAILED would still go
to stepC.
Batch Status Versus Exit Status

When configuring a Job for conditional flow, it is important to understand the
difference between BatchStatus and ExitStatus. BatchStatus is an enumeration that
is a property of both JobExecution and StepExecution and is used by the framework to
record the status of a Job or Step. It can be one of the following values:
COMPLETED, STARTING, STARTED, STOPPING, STOPPED, FAILED, ABANDONED, or
UNKNOWN. Most of them are self explanatory: COMPLETED is the status set when a step
or job has completed successfully, FAILED is set when it fails, and so on.




    

The following example contains the next element when using XML configuration:
<next on="FAILED" to="stepB" />
The following example contains the on element when using Java Configuration:
.from(stepA()).on("FAILED").to(stepB())
At first glance, it would appear that on references the BatchStatus of the Step to
which it belongs. However, it actually references the ExitStatus of the Step. As the
name implies, ExitStatus represents the status of a Step after it finishes execution.
More specifically, when using XML configuration, the next element shown in the
preceding XML configuration example references the exit code of ExitStatus.
When using Java configuration, the on() method shown in the preceding
Java configuration example references the exit code of ExitStatus.
In English, it says: “go to stepB if the exit code is FAILED”. By default, the exit
code is always the same as the BatchStatus for the Step, which is why the preceding entry
works. However, what if the exit code needs to be different? A good example comes from
the skip sample job within the samples project:
The following example shows how to work with a different exit code in XML:
XML Configuration
<step id="step1" parent="s1">
    <end on="FAILED" />
    <next on="COMPLETED WITH SKIPS" to="errorPrint1" />
    <next on="*" to="step2" />
</step>
The following example shows how to work with a different exit code in Java:
Java Configuration
@Bean
public Job job(JobRepository jobRepository) {
	return new JobBuilder("job", jobRepository)
			.start(step1()).on("FAILED").end()
			.from(step1()).on("COMPLETED WITH SKIPS").to(errorPrint1())
			.from(step1()).on("*").to(step2())
			.end()
			.build();
The preceding configuration works. However, something needs to change the exit code based on
the condition of the execution having skipped records, as the following example shows:
public class SkipCheckingListener extends StepExecutionListenerSupport {
    public ExitStatus afterStep(StepExecution stepExecution) {
        String exitCode = stepExecution.getExitStatus().getExitCode();
        if (!exitCode.equals(ExitStatus.FAILED.getExitCode()) &&
              stepExecution.getSkipCount() > 0) {
            return new ExitStatus("COMPLETED WITH SKIPS");
        else {
            return null;
The preceding code is a StepExecutionListener that first checks to make sure the Step was
successful and then checks to see if the skip count on the StepExecution is higher than
0. If both conditions are met, a new ExitStatus with an exit code of
COMPLETED WITH SKIPS is returned.
After the discussion of BatchStatus and ExitStatus,
one might wonder how the BatchStatus and ExitStatus are determined for the Job.
While these statuses are determined for the Step by the code that is executed, the
statuses for the Job are determined based on the configuration.
So far, all of the job configurations discussed have had at least one final Step with
no transitions.
In the following XML example, after the step executes, the Job ends:
<step id="stepC" parent="s3"/>
In the following Java example, after the step executes, the Job ends:
@Bean
public Job job(JobRepository jobRepository) {
	return new JobBuilder("job", jobRepository)
				.start(step1())
				.build();
If the Step ends with ExitStatus of FAILED, the BatchStatus and ExitStatus of
the Job are both FAILED.
Otherwise, the BatchStatus and ExitStatus of the Job are both COMPLETED.
While this method of terminating a batch job is sufficient for some batch jobs, such as a
simple sequential step job, custom defined job-stopping scenarios may be required. For
this purpose, Spring Batch provides three transition elements to stop a Job (in
addition to the next element that we discussed previously).
Each of these stopping elements stops a Job with a particular BatchStatus. It is
important to note that the stop transition elements have no effect on either the
BatchStatus or ExitStatus of any Steps in the Job. These elements affect only the
final statuses of the Job. For example, it is possible for every step in a job to have
a status of FAILED but for the job to have a status of COMPLETED.
Ending at a Step

Configuring a step end instructs a Job to stop with a BatchStatus of COMPLETED. A
Job that has finished with a status of COMPLETED cannot be restarted (the framework throws
a JobInstanceAlreadyCompleteException).
When using XML configuration, you can use the end element for this task.  The end element
also allows for an optional exit-code attribute that you can use to customize the
ExitStatus of the Job. If no exit-code attribute is given, the ExitStatus is
COMPLETED by default, to match the BatchStatus.
When using Java configuration, the end method is used for this task.  The end method
also allows for an optional exitStatus parameter that you can use to customize the
ExitStatus of the Job. If no exitStatus value is provided, the ExitStatus is
COMPLETED by default, to match the BatchStatus.
Consider the following scenario: If step2 fails, the Job stops with a
BatchStatus of COMPLETED and an ExitStatus of COMPLETED, and step3 does not run.
Otherwise, execution moves to step3. Note that if step2 fails, the Job is not
restartable (because the status is COMPLETED).
The following example shows the scenario in XML:
<step id="step1" parent="s1" next="step2">
<step id="step2" parent="s2">
    <end on="FAILED"/>
    <next on="*" to="step3"/>
</step>
<step id="step3" parent="s3">
The following example shows the scenario in Java:
@Bean
public Job job(JobRepository jobRepository) {
	return new JobBuilder("job", jobRepository)
				.start(step1())
				.next(step2())
				.on("FAILED").end()
				.from(step2()).on("*").to(step3())
				.end()
				.build();
Configuring a step to fail at a given point instructs a Job to stop with a
BatchStatus of FAILED. Unlike end, the failure of a Job does not prevent the Job
from being restarted.
When using XML configuration, the fail element also allows for an optional exit-code
attribute that can be used to customize the ExitStatus of the Job. If no exit-code
attribute is given, the ExitStatus is FAILED by default, to match the
BatchStatus.
Consider the following scenario: If step2 fails, the Job stops with a
BatchStatus of FAILED and an ExitStatus of EARLY TERMINATION and step3 does not
execute. Otherwise, execution moves to step3. Additionally, if step2 fails and the
Job is restarted, execution begins again on step2.
The following example shows the scenario in XML:
XML Configuration
<step id="step1" parent="s1" next="step2">
<step id="step2" parent="s2">
    <fail on="FAILED" exit-code="EARLY TERMINATION"/>
    <next on="*" to="step3"/>
</step>
<step id="step3" parent="s3">
The following example shows the scenario in Java:
Java Configuration
@Bean
public Job job(JobRepository jobRepository) {
	return new JobBuilder("job", jobRepository)
			.start(step1())
			.next(step2()).on("FAILED").fail()
			.from(step2()).on("*").to(step3())
			.end()
			.build();
Stopping a Job at a Given Step

Configuring a job to stop at a particular step instructs a Job to stop with a
BatchStatus of STOPPED. Stopping a Job can provide a temporary break in processing,
so that the operator can take some action before restarting the Job.
When using XML configuration, a stop element requires a restart attribute that specifies
the step where execution should pick up when the Job is restarted.
When using Java configuration, the stopAndRestart method requires a restart attribute
that specifies the step where execution should pick up when the Job is restarted.
Consider the following scenario: If step1 finishes with COMPLETE, the job then
stops. Once it is restarted, execution begins on step2.
The following listing shows the scenario in XML:
<step id="step1" parent="s1">
    <stop on="COMPLETED" restart="step2"/>
</step>
<step id="step2" parent="s2"/>
The following example shows the scenario in Java:
@Bean
public Job job(JobRepository jobRepository) {
	return new JobBuilder("job", jobRepository)
			.start(step1()).on("COMPLETED").stopAndRestart(step2())
			.end()
			.build();
5.3.4. Programmatic Flow Decisions

In some situations, more information than the ExitStatus may be required to decide
which step to execute next. In this case, a JobExecutionDecider can be used to assist
in the decision, as the following example shows:
public class MyDecider implements JobExecutionDecider {
    public FlowExecutionStatus decide(JobExecution jobExecution, StepExecution stepExecution) {
        String status;
        if (someCondition()) {
            status = "FAILED";
        else {
            status = "COMPLETED";
        return new FlowExecutionStatus(status);
In the following sample job configuration, a decision specifies the decider to use as
well as all of the transitions:
XML Configuration
<job id="job">
    <step id="step1" parent="s1" next="decision" />
    <decision id="decision" decider="decider">
        <next on="FAILED" to="step2" />
        <next on="COMPLETED" to="step3" />
    </decision>
    <step id="step2" parent="s2" next="step3"/>
    <step id="step3" parent="s3" />
<beans:bean id="decider" class="com.MyDecider"/>
In the following example, a bean implementing the JobExecutionDecider is passed
directly to the next call when using Java configuration:
Java Configuration
@Bean
public Job job(JobRepository jobRepository) {
	return new JobBuilder("job", jobRepository)
			.start(step1())
			.next(decider()).on("FAILED").to(step2())
			.from(decider()).on("COMPLETED").to(step3())
			.end()
			.build();
5.3.5. Split Flows

Every scenario described so far has involved a Job that executes its steps one at a
time in a linear fashion. In addition to this typical style, Spring Batch also allows
for a job to be configured with parallel flows.
The XML namespace lets you use the split element. As the following example shows,
the split element contains one or more flow elements, where entire separate flows can
be defined. A split element can also contain any of the previously discussed transition
elements, such as the next attribute or the next, end, or fail elements.
<split id="split1" next="step4">
        <step id="step1" parent="s1" next="step2"/>
        <step id="step2" parent="s2"/>
    </flow>
        <step id="step3" parent="s3"/>
    </flow>
</split>
<step id="step4" parent="s4"/>
Java-based configuration lets you configure splits through the provided builders. As the
following example shows, the split element contains one or more flow elements, where
entire separate flows can be defined. A split element can also contain any of the
previously discussed transition elements, such as the next attribute or the next,
end, or fail elements.
@Bean
public Flow flow1() {
	return new FlowBuilder<SimpleFlow>("flow1")
			.start(step1())
			.next(step2())
			.build();
@Bean
public Flow flow2() {
	return new FlowBuilder<SimpleFlow>("flow2")
			.start(step3())
			.build();
@Bean
public Job job(Flow flow1, Flow flow2) {
	return this.jobBuilderFactory.get("job")
				.start(flow1)
				.split(new SimpleAsyncTaskExecutor())
				.add(flow2)
				.next(step4())
				.end()
				.build();
5.3.6. Externalizing Flow Definitions and Dependencies Between Jobs

Part of the flow in a job can be externalized as a separate bean definition and then
re-used. There are two ways to do so. The first is to declare the flow as a
reference to one defined elsewhere.
The following XML example shows how to declare a flow as a reference to a flow defined
elsewhere:
XML Configuration
<job id="job">
    <flow id="job1.flow1" parent="flow1" next="step3"/>
    <step id="step3" parent="s3"/>
<flow id="flow1">
    <step id="step1" parent="s1" next="step2"/>
    <step id="step2" parent="s2"/>
</flow>
The following Java example shows how to declare a flow as a reference to a flow defined
elsewhere:
Java Confguration
@Bean
public Job job(JobRepository jobRepository) {
	return new JobBuilder("job", jobRepository)
				.start(flow1())
				.next(step3())
				.end()
				.build();
@Bean
public Flow flow1() {
	return new FlowBuilder<SimpleFlow>("flow1")
			.start(step1())
			.next(step2())
			.build();
The effect of defining an external flow, as shown in the preceding example, is to insert
the steps from the external flow into the job as if they had been declared inline. In
this way, many jobs can refer to the same template flow and compose such templates into
different logical flows. This is also a good way to separate the integration testing of
the individual flows.
The other form of an externalized flow is to use a JobStep. A JobStep is similar to a
FlowStep but actually creates and launches a separate job execution for the steps in
the flow specified.




    

The following example hows an example of a JobStep in XML:
XML Configuration
<job id="jobStepJob" restartable="true">
   <step id="jobStepJob.step1">
      <job ref="job" job-launcher="jobLauncher"
          job-parameters-extractor="jobParametersExtractor"/>
   </step>
<job id="job" restartable="true">...</job>
<bean id="jobParametersExtractor" class="org.spr...DefaultJobParametersExtractor">
   <property name="keys" value="input.file"/>
</bean>
The following example shows an example of a JobStep in Java:
Java Configuration
@Bean
public Job jobStepJob(JobRepository jobRepository) {
	return new JobBuilder("jobStepJob", jobRepository)
				.start(jobStepJobStep1(null))
				.build();
@Bean
public Step jobStepJobStep1(JobLauncher jobLauncher, JobRepository jobRepository) {
	return new StepBuilder("jobStepJobStep1", jobRepository)
				.job(job())
				.launcher(jobLauncher)
				.parametersExtractor(jobParametersExtractor())
				.build();
@Bean
public Job job(JobRepository jobRepository) {
	return new JobBuilder("job", jobRepository)
				.start(step1())
				.build();
@Bean
public DefaultJobParametersExtractor jobParametersExtractor() {
	DefaultJobParametersExtractor extractor = new DefaultJobParametersExtractor();
	extractor.setKeys(new String[]{"input.file"});
	return extractor;
The job parameters extractor is a strategy that determines how the ExecutionContext for
the Step is converted into JobParameters for the Job that is run. The JobStep is
useful when you want to have some more granular options for monitoring and reporting on
jobs and steps. Using JobStep is also often a good answer to the question: “How do I
create dependencies between jobs?” It is a good way to break up a large system into
smaller modules and control the flow of jobs.
5.4. Late Binding of Job and Step Attributes

Both the XML and flat file examples shown earlier use the Spring Resource abstraction
to obtain a file. This works because Resource has a getFile method that returns a
java.io.File. You can configure both XML and flat file resources by using standard Spring
constructs:
The following example shows late binding in XML:
XML Configuration
<bean id="flatFileItemReader"
      class="org.springframework.batch.item.file.FlatFileItemReader">
    <property name="resource"
              value="file://outputs/file.txt" />
</bean>
The following example shows late binding in Java:
Java Configuration
@Bean
public FlatFileItemReader flatFileItemReader() {
	FlatFileItemReader<Foo> reader = new FlatFileItemReaderBuilder<Foo>()
			.name("flatFileItemReader")
			.resource(new FileSystemResource("file://outputs/file.txt"))
The preceding Resource loads the file from the specified file system location. Note
that absolute locations have to start with a double slash (//). In most Spring
applications, this solution is good enough, because the names of these resources are
known at compile time. However, in batch scenarios, the file name may need to be
determined at runtime as a parameter to the job. This can be solved using -D parameters
to read a system property.
The following example shows how to read a file name from a property in XML:
XML Configuration
<bean id="flatFileItemReader"
      class="org.springframework.batch.item.file.FlatFileItemReader">
    <property name="resource" value="${input.file.name}" />
</bean>
The following shows how to read a file name from a property in Java:
Java Configuration
@Bean
public FlatFileItemReader flatFileItemReader(@Value("${input.file.name}") String name) {
	return new FlatFileItemReaderBuilder<Foo>()
			.name("flatFileItemReader")
			.resource(new FileSystemResource(name))
Although you can use a PropertyPlaceholderConfigurer here, it is not
necessary if the system property is always set because the ResourceEditor in Spring
already filters and does placeholder replacement on system properties.
Often, in a batch setting, it is preferable to parameterize the file name in the
JobParameters of the job (instead of through system properties) and access them that
way. To accomplish this, Spring Batch allows for the late binding of various Job and
Step attributes.
The following example shows how to parameterize a file name in XML:
XML Configuration
<bean id="flatFileItemReader" scope="step"
      class="org.springframework.batch.item.file.FlatFileItemReader">
    <property name="resource" value="#{jobParameters['input.file.name']}" />
</bean>
The following example shows how to parameterize a file name in Java:
Java Configuration
@StepScope
@Bean
public FlatFileItemReader flatFileItemReader(@Value("#{jobParameters['input.file.name']}") String name) {
	return new FlatFileItemReaderBuilder<Foo>()
			.name("flatFileItemReader")
			.resource(new FileSystemResource(name))
You can access both the JobExecution and StepExecution level ExecutionContext in
the same way.
The following example shows how to access the ExecutionContext in XML:
XML Configuration
<bean id="flatFileItemReader" scope="step"
      class="org.springframework.batch.item.file.FlatFileItemReader">
    <property name="resource" value="#{jobExecutionContext['input.file.name']}" />
</bean>
XML Configuration
<bean id="flatFileItemReader" scope="step"
      class="org.springframework.batch.item.file.FlatFileItemReader">
    <property name="resource" value="#{stepExecutionContext['input.file.name']}" />
</bean>
The following example shows how to access the ExecutionContext in Java:
Java Configuration
@StepScope
@Bean
public FlatFileItemReader flatFileItemReader(@Value("#{jobExecutionContext['input.file.name']}") String name) {
	return new FlatFileItemReaderBuilder<Foo>()
			.name("flatFileItemReader")
			.resource(new FileSystemResource(name))
@StepScope
@Bean
public FlatFileItemReader flatFileItemReader(@Value("#{stepExecutionContext['input.file.name']}") String name) {
	return new FlatFileItemReaderBuilder<Foo>()
			.name("flatFileItemReader")
			.resource(new FileSystemResource(name))
Any bean that uses late binding must be declared with scope="step". See
Step Scope for more information.
A Step bean should not be step-scoped. If late binding is needed in a step
definition, the components of that step (tasklet, item reader or writer, and so on)
are the ones that should be scoped instead.
If you use Spring 3.0 (or above), the expressions in step-scoped beans are in the
Spring Expression Language, a powerful general purpose language with many interesting
features. To provide backward compatibility, if Spring Batch detects the presence of
older versions of Spring, it uses a native expression language that is less powerful and
that has slightly different parsing rules. The main difference is that the map keys in
the example above do not need to be quoted with Spring 2.5, but the quotes are mandatory
in Spring 3.0.
5.4.1. Step Scope

All of the late binding examples shown earlier have a scope of step declared on the
bean definition.
The following example shows an example of binding to step scope in XML:
XML Configuration
<bean id="flatFileItemReader" scope="step"
      class="org.springframework.batch.item.file.FlatFileItemReader">
    <property name="resource" value="#{jobParameters[input.file.name]}" />
</bean>
The following example shows an example of binding to step scope in Java:
Java Configuration
@StepScope
@Bean
public FlatFileItemReader flatFileItemReader(@Value("#{jobParameters[input.file.name]}") String name) {
	return new FlatFileItemReaderBuilder<Foo>()
			.name("flatFileItemReader")
			.resource(new FileSystemResource(name))
Using a scope of Step is required to use late binding, because the bean cannot
actually be instantiated until the Step starts, to let the attributes be found.
Because it is not part of the Spring container by default, the scope must be added
explicitly, by using the batch namespace, by including a bean definition explicitly
for the StepScope, or by using the @EnableBatchProcessing annotation. Use only one of
those methods.  The following example uses the batch namespace:
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:batch="http://www.springframework.org/schema/batch"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="...">
<batch:job .../>
</beans>
The following example includes the bean definition explicitly:
<bean class="org.springframework.batch.core.scope.StepScope" />
5.4.2. Job Scope

Job scope, introduced in Spring Batch 3.0, is similar to Step scope in configuration
but is a scope for the Job context, so that there is only one instance of such a bean
per running job. Additionally, support is provided for late binding of references
accessible from the JobContext by using #{..} placeholders. Using this feature, you can pull bean
properties from the job or job execution context and the job parameters.
The following example shows an example of binding to job scope in XML:
XML Configuration
<bean id="..." class="..." scope="job">
    <property name="name" value="#{jobParameters[input]}" />
</bean>
XML Configuration
<bean id="..." class="..." scope="job">
    <property name="name" value="#{jobExecutionContext['input.name']}.txt" />
</bean>
The following example shows an example of binding to job scope in Java:
Java Configuration
@JobScope
@Bean
public FlatFileItemReader flatFileItemReader(@Value("#{jobParameters[input]}") String name) {
	return new FlatFileItemReaderBuilder<Foo>()
			.name("flatFileItemReader")
			.resource(new FileSystemResource(name))
@JobScope
@Bean
public FlatFileItemReader flatFileItemReader(@Value("#{jobExecutionContext['input.name']}") String name) {
	return new FlatFileItemReaderBuilder<Foo>()
			.name("flatFileItemReader")
			.resource(new FileSystemResource(name))
Because it is not part of the Spring container by default, the scope must be added
explicitly, by using the batch namespace, by including a bean definition explicitly for
the JobScope, or by using the @EnableBatchProcessing annotation (choose only one approach).
The following example uses the batch namespace:
<beans xmlns="http://www.springframework.org/schema/beans"
		  xmlns:batch="http://www.springframework.org/schema/batch"
		  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
		  xsi:schemaLocation="...">
<batch:job .../>
</beans>
The following example includes a bean that explicitly defines the JobScope:
<bean class="org.springframework.batch.core.scope.JobScope" />
There are some practical limitations of using job-scoped beans in multi-threaded
or partitioned steps. Spring Batch does not control the threads spawned in these
use cases, so it is not possible to set them up correctly to use such beans. Hence,
we do not recommend using job-scoped beans in multi-threaded or partitioned steps.
5.4.3. Scoping ItemStream components

When using the Java configuration style to define job or step scoped ItemStream beans,
the return type of the bean definition method should be at least ItemStream. This is required
so that Spring Batch correctly creates a proxy that implements this interface, and therefore
honors its contract by calling open, update and close methods as expected.
It is recommended to make the bean definition method of such beans return the most specific
known implementation, as shown in the following example:
Define a step-scoped bean with the most specific return type
@Bean
@StepScope
public FlatFileItemReader flatFileItemReader(@Value("#{jobParameters['input.file.name']}") String name) {
	return new FlatFileItemReaderBuilder<Foo>()
			.resource(new FileSystemResource(name))
			// set other properties of the item reader
			.build();
All batch processing can be described in its most simple form as reading in large amounts
of data, performing some type of calculation or transformation, and writing the result
out. Spring Batch provides three key interfaces to help perform bulk reading and writing:
ItemReader, ItemProcessor, and ItemWriter.
6.1. ItemReader

Although a simple concept, an ItemReader is the means for providing data from many
different types of input. The most general examples include:
Flat File: Flat-file item readers read lines of data from a flat file that typically
describes records with fields of data defined by fixed positions in the file or delimited
by some special character (such as a comma).
XML: XML ItemReaders process XML independently of technologies used for parsing,
mapping and validating objects. Input data allows for the validation of an XML file
against an XSD schema.
Database: A database resource is accessed to return resultsets which can be mapped to
objects for processing. The default SQL ItemReader implementations invoke a RowMapper
to return objects, keep track of the current row if restart is required, store basic
statistics, and provide some transaction enhancements that are explained later.
There are many more possibilities, but we focus on the basic ones for this chapter. A
complete list of all available ItemReader implementations can be found in
Appendix A.
ItemReader is a basic interface for generic
input operations, as shown in the following interface definition:
public interface ItemReader<T> {
    T read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException;
The read method defines the most essential contract of the ItemReader. Calling it
returns one item or null if no more items are left. An item might represent a line in a
file, a row in a database, or an element in an XML file. It is generally expected that
these are mapped to a usable domain object (such as Trade, Foo, or others), but there
is no requirement in the contract to do so.




    

It is expected that implementations of the ItemReader interface are forward only.
However, if the underlying resource is transactional (such as a JMS queue) then calling
read may return the same logical item on subsequent calls in a rollback scenario. It is
also worth noting that a lack of items to process by an ItemReader does not cause an
exception to be thrown. For example, a database ItemReader that is configured with a
query that returns 0 results returns null on the first invocation of read.
6.2. ItemWriter

ItemWriter is similar in functionality to an ItemReader but with inverse operations.
Resources still need to be located, opened, and closed but they differ in that an
ItemWriter writes out, rather than reading in. In the case of databases or queues,
these operations may be inserts, updates, or sends. The format of the serialization of
the output is specific to each batch job.
As with ItemReader,
ItemWriter is a fairly generic interface, as shown in the following interface definition:
public interface ItemWriter<T> {
    void write(Chunk<? extends T> items) throws Exception;
As with read on ItemReader, write provides the basic contract of ItemWriter. It
attempts to write out the list of items passed in as long as it is open. Because it is
generally expected that items are 'batched' together into a chunk and then output, the
interface accepts a list of items, rather than an item by itself. After writing out the
list, any flushing that may be necessary can be performed before returning from the write
method. For example, if writing to a Hibernate DAO, multiple calls to write can be made,
one for each item. The writer can then call flush on the hibernate session before
returning.
6.3. ItemStream

Both ItemReaders and ItemWriters serve their individual purposes well, but there is a
common concern among both of them that necessitates another interface. In general, as
part of the scope of a batch job, readers and writers need to be opened, closed, and
require a mechanism for persisting state. The ItemStream interface serves that purpose,
as shown in the following example:
public interface ItemStream {
    void open(ExecutionContext executionContext) throws ItemStreamException;
    void update(ExecutionContext executionContext) throws ItemStreamException;
    void close() throws ItemStreamException;
Before describing each method, we should mention the ExecutionContext. Clients of an
ItemReader that also implement ItemStream should call open before any calls to
read, in order to open any resources such as files or to obtain connections. A similar
restriction applies to an ItemWriter that implements ItemStream. As mentioned in
Chapter 2, if expected data is found in the ExecutionContext, it may be used to start
the ItemReader or ItemWriter at a location other than its initial state. Conversely,
close is called to ensure that any resources allocated during open are released safely.
update is called primarily to ensure that any state currently being held is loaded into
the provided ExecutionContext. This method is called before committing, to ensure that
the current state is persisted in the database before commit.
In the special case where the client of an ItemStream is a Step (from the Spring
Batch Core), an ExecutionContext is created for each StepExecution to allow users to
store the state of a particular execution, with the expectation that it is returned if
the same JobInstance is started again. For those familiar with Quartz, the semantics
are very similar to a Quartz JobDataMap.
6.4. The Delegate Pattern and Registering with the Step

Note that the CompositeItemWriter is an example of the delegation pattern, which is
common in Spring Batch. The delegates themselves might implement callback interfaces,
such as StepListener. If they do and if they are being used in conjunction with Spring
Batch Core as part of a Step in a Job, then they almost certainly need to be
registered manually with the Step. A reader, writer, or processor that is directly
wired into the Step gets registered automatically if it implements ItemStream or a
StepListener interface. However, because the delegates are not known to the Step,
they need to be injected as listeners or streams (or both if appropriate).
The following example shows how to inject a delegate as a stream in XML:
XML Configuration
<job id="ioSampleJob">
    <step name="step1">
        <tasklet>
            <chunk reader="fooReader" processor="fooProcessor" writer="compositeItemWriter"
                   commit-interval="2">
                <streams>
                    <stream ref="barWriter" />
                </streams>
            </chunk>
        </tasklet>
    </step>
<bean id="compositeItemWriter" class="...CustomCompositeItemWriter">
    <property name="delegate" ref="barWriter" />
</bean>
<bean id="barWriter" class="...BarWriter" />
The following example shows how to inject a delegate as a stream in XML:
Java Configuration
@Bean
public Job ioSampleJob(JobRepository jobRepository) {
	return new JobBuilder("ioSampleJob", jobRepository)
				.start(step1())
				.build();
@Bean
public Step step1(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
	return new StepBuilder("step1", jobRepository)
				.<String, String>chunk(2, transactionManager)
				.reader(fooReader())
				.processor(fooProcessor())
				.writer(compositeItemWriter())
				.stream(barWriter())
				.build();
@Bean
public CustomCompositeItemWriter compositeItemWriter() {
	CustomCompositeItemWriter writer = new CustomCompositeItemWriter();
	writer.setDelegate(barWriter());
	return writer;
@Bean
public BarWriter barWriter() {
	return new BarWriter();
6.5. Flat Files

One of the most common mechanisms for interchanging bulk data has always been the flat
file. Unlike XML, which has an agreed upon standard for defining how it is structured
(XSD), anyone reading a flat file must understand ahead of time exactly how the file is
structured. In general, all flat files fall into two types: delimited and fixed length.
Delimited files are those in which fields are separated by a delimiter, such as a comma.
Fixed Length files have fields that are a set length.
6.5.1. The FieldSet

When working with flat files in Spring Batch, regardless of whether it is for input or
output, one of the most important classes is the FieldSet. Many architectures and
libraries contain abstractions for helping you read in from a file, but they usually
return a String or an array of String objects. This really only gets you halfway
there. A FieldSet is Spring Batch’s abstraction for enabling the binding of fields from
a file resource. It allows developers to work with file input in much the same way as
they would work with database input. A FieldSet is conceptually similar to a JDBC
ResultSet. A FieldSet requires only one argument: a String array of tokens.
Optionally, you can also configure the names of the fields so that the fields may be
accessed either by index or name as patterned after ResultSet, as shown in the following
example:
String[] tokens = new String[]{"foo", "1", "true"};
FieldSet fs = new DefaultFieldSet(tokens);
String name = fs.readString(0);
int value = fs.readInt(1);
boolean booleanValue = fs.readBoolean(2);
There are many more options on the FieldSet interface, such as Date, long,
BigDecimal, and so on. The biggest advantage of the FieldSet is that it provides
consistent parsing of flat file input. Rather than each batch job parsing differently in
potentially unexpected ways, it can be consistent, both when handling errors caused by a
format exception, or when doing simple data conversions.
6.5.2. FlatFileItemReader

A flat file is any type of file that contains at most two-dimensional (tabular) data.
Reading flat files in the Spring Batch framework is facilitated by the class called
FlatFileItemReader, which provides basic functionality for reading and parsing flat
files. The two most important required dependencies of FlatFileItemReader are
Resource and LineMapper. The LineMapper interface is explored more in the next
sections. The resource property represents a Spring Core Resource. Documentation
explaining how to create beans of this type can be found in
Spring
Framework, Chapter 5. Resources. Therefore, this guide does not go into the details of
creating Resource objects beyond showing the following simple example:
Resource resource = new FileSystemResource("resources/trades.csv");
In complex batch environments, the directory structures are often managed by the Enterprise Application Integration (EAI)
infrastructure, where drop zones for external interfaces are established for moving files
from FTP locations to batch processing locations and vice versa. File moving utilities
are beyond the scope of the Spring Batch architecture, but it is not unusual for batch
job streams to include file moving utilities as steps in the job stream. The batch
architecture only needs to know how to locate the files to be processed. Spring Batch
begins the process of feeding the data into the pipe from this starting point. However,
Spring Integration provides many
of these types of services.
The other properties in FlatFileItemReader let you further specify how your data is
interpreted, as described in the following table:
Table 15. FlatFileItemReader Properties
RecordSeparatorPolicy
Used to determine where the line endings are
and do things like continue over a line ending if inside a quoted string.
resource
Resource
The resource from which to read.
skippedLinesCallback
LineCallbackHandler
Interface that passes the raw line content of
the lines in the file to be skipped. If linesToSkip is set to 2, then this interface is
called twice.
strict
boolean
In strict mode, the reader throws an exception on ExecutionContext if
the input resource does not exist. Otherwise, it logs the problem and continues.
LineMapper

As with RowMapper, which takes a low-level construct such as ResultSet and returns
an Object, flat file processing requires the same construct to convert a String line
into an Object, as shown in the following interface definition:
public interface LineMapper<T> {
    T mapLine(String line, int lineNumber) throws Exception;
The basic contract is that, given the current line and the line number with which it is
associated, the mapper should return a resulting domain object. This is similar to
RowMapper, in that each line is associated with its line number, just as each row in a
ResultSet is tied to its row number. This allows the line number to be tied to the
resulting domain object for identity comparison or for more informative logging. However,
unlike RowMapper, the LineMapper is given a raw line which, as discussed above, only
gets you halfway there. The line must be tokenized into a FieldSet, which can then be
mapped to an object, as described later in this document.
LineTokenizer

An abstraction for turning a line of input into a FieldSet is necessary because there
can be many formats of flat file data that need to be converted to a FieldSet. In
Spring Batch, this interface is the LineTokenizer:
public interface LineTokenizer {
    FieldSet tokenize(String line);
The contract of a LineTokenizer is such that, given a line of input (in theory the
String could encompass more than one line), a FieldSet representing the line is
returned. This FieldSet can then be passed to a FieldSetMapper. Spring Batch contains
the following LineTokenizer implementations:
DelimitedLineTokenizer: Used for files where fields in a record are separated by a
delimiter. The most common delimiter is a comma, but pipes or semicolons are often used
as well.
FixedLengthTokenizer: Used for files where fields in a record are each a "fixed
width". The width of each field must be defined for each record type.
PatternMatchingCompositeLineTokenizer: Determines which LineTokenizer among a list of
tokenizers should be used on a particular line by checking against a pattern.
FieldSetMapper

The FieldSetMapper interface defines a single method, mapFieldSet, which takes a
FieldSet object and maps its contents to an object. This object may be a custom DTO, a
domain object, or an array, depending on the needs of the job. The FieldSetMapper is
used in conjunction with the LineTokenizer to translate a line of data from a resource
into an object of the desired type, as shown in the following interface definition:
public interface FieldSetMapper<T> {
    T mapFieldSet(FieldSet fieldSet) throws BindException;
DefaultLineMapper

Now that the basic interfaces for reading in flat files have been defined, it becomes
clear that three basic steps are required:
The two interfaces described above represent two separate tasks: converting a line into a
FieldSet and mapping a FieldSet to a domain object. Because the input of a
LineTokenizer matches the input of the LineMapper (a line), and the output of a
FieldSetMapper matches the output of the LineMapper, a default implementation that
uses both a LineTokenizer and a FieldSetMapper is provided. The DefaultLineMapper,
shown in the following class definition, represents the behavior most users need:
public class DefaultLineMapper<T> implements LineMapper<>, InitializingBean {
    private LineTokenizer tokenizer;
    private FieldSetMapper<T> fieldSetMapper;
    public T mapLine(String line, int lineNumber) throws Exception {
        return fieldSetMapper.mapFieldSet(tokenizer.tokenize(line));
    public void setLineTokenizer(LineTokenizer tokenizer) {
        this.tokenizer = tokenizer;
    public void setFieldSetMapper(FieldSetMapper<T> fieldSetMapper) {
        this.fieldSetMapper = fieldSetMapper;
The above functionality is provided in a default implementation, rather than being built
into the reader itself (as was done in previous versions of the framework) to allow users
greater flexibility in controlling the parsing process, especially if access to the raw
line is needed.




    

Simple Delimited File Reading Example

The following example illustrates how to read a flat file with an actual domain scenario.
This particular batch job reads in football players from the following file:
ID,lastName,firstName,position,birthYear,debutYear
"AbduKa00,Abdul-Jabbar,Karim,rb,1974,1996",
"AbduRa00,Abdullah,Rabih,rb,1975,1999",
"AberWa00,Abercrombie,Walter,rb,1959,1982",
"AbraDa00,Abramowicz,Danny,wr,1945,1967",
"AdamBo00,Adams,Bob,te,1946,1969",
"AdamCh00,Adams,Charlie,wr,1979,2003"
The contents of this file are mapped to the following
Player domain object:
public class Player implements Serializable {
    private String ID;
    private String lastName;
    private String firstName;
    private String position;
    private int birthYear;
    private int debutYear;
    public String toString() {
        return "PLAYER:ID=" + ID + ",Last Name=" + lastName +
            ",First Name=" + firstName + ",Position=" + position +
            ",Birth Year=" + birthYear + ",DebutYear=" +
            debutYear;
    // setters and getters...
protected static class PlayerFieldSetMapper implements FieldSetMapper<Player> {
    public Player mapFieldSet(FieldSet fieldSet) {
        Player player = new Player();
        player.setID(fieldSet.readString(0));
        player.setLastName(fieldSet.readString(1));
        player.setFirstName(fieldSet.readString(2));
        player.setPosition(fieldSet.readString(3));
        player.setBirthYear(fieldSet.readInt(4));
        player.setDebutYear(fieldSet.readInt(5));
        return player;
FlatFileItemReader<Player> itemReader = new FlatFileItemReader<>();
itemReader.setResource(new FileSystemResource("resources/players.csv"));
DefaultLineMapper<Player> lineMapper = new DefaultLineMapper<>();
//DelimitedLineTokenizer defaults to comma as its delimiter
lineMapper.setLineTokenizer(new DelimitedLineTokenizer());
lineMapper.setFieldSetMapper(new PlayerFieldSetMapper());
itemReader.setLineMapper(lineMapper);
itemReader.open(new ExecutionContext());
Player player = itemReader.read();
Each call to read returns a new
        Player object from each line in the file. When the end of the file is
        reached, null is returned.
Mapping Fields by Name

There is one additional piece of functionality that is allowed by both
DelimitedLineTokenizer and FixedLengthTokenizer and that is similar in function to a
JDBC ResultSet. The names of the fields can be injected into either of these
LineTokenizer implementations to increase the readability of the mapping function.
First, the column names of all fields in the flat file are injected into the tokenizer,
as shown in the following example:
tokenizer.setNames(new String[] {"ID", "lastName", "firstName", "position", "birthYear", "debutYear"});
public class PlayerMapper implements FieldSetMapper<Player> {
    public Player mapFieldSet(FieldSet fs) {
       if (fs == null) {
           return null;
       Player player = new Player();
       player.setID(fs.readString("ID"));
       player.setLastName(fs.readString("lastName"));
       player.setFirstName(fs.readString("firstName"));
       player.setPosition(fs.readString("position"));
       player.setDebutYear(fs.readInt("debutYear"));
       player.setBirthYear(fs.readInt("birthYear"));
       return player;
Automapping FieldSets to Domain Objects

For many, having to write a specific FieldSetMapper is equally as cumbersome as writing
a specific RowMapper for a JdbcTemplate. Spring Batch makes this easier by providing
a FieldSetMapper that automatically maps fields by matching a field name with a setter
on the object using the JavaBean specification.
Again using the football example, the BeanWrapperFieldSetMapper configuration looks like
the following snippet in XML:
XML Configuration
<bean id="fieldSetMapper"
      class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
    <property name="prototypeBeanName" value="player" />
</bean>
<bean id="player"
      class="org.springframework.batch.sample.domain.Player"
      scope="prototype" />
Again using the football example, the BeanWrapperFieldSetMapper configuration looks like
the following snippet in Java:
Java Configuration
@Bean
public FieldSetMapper fieldSetMapper() {
	BeanWrapperFieldSetMapper fieldSetMapper = new BeanWrapperFieldSetMapper();
	fieldSetMapper.setPrototypeBeanName("player");
	return fieldSetMapper;
@Bean
@Scope("prototype")
public Player player() {
	return new Player();
For each entry in the FieldSet, the mapper looks for a corresponding setter on a new
instance of the Player object (for this reason, prototype scope is required) in the
same way the Spring container looks for setters matching a property name. Each available
field in the FieldSet is mapped, and the resultant Player object is returned, with no
code required.
Fixed Length File Formats

So far, only delimited files have been discussed in much detail. However, they represent
only half of the file reading picture. Many organizations that use flat files use fixed
length formats. An example fixed length file follows:
UK21341EAH4121131.11customer1
UK21341EAH4221232.11customer2
UK21341EAH4321333.11customer3
UK21341EAH4421434.11customer4
UK21341EAH4521535.11customer5
While this looks like one large field, it actually represent 4 distinct fields:
When configuring the FixedLengthLineTokenizer, each of these lengths must be provided
in the form of ranges.
The following example shows how to define ranges for the FixedLengthLineTokenizer in
XML Configuration
<bean id="fixedLengthLineTokenizer"
      class="org.springframework.batch.item.file.transform.FixedLengthTokenizer">
    <property name="names" value="ISIN,Quantity,Price,Customer" />
    <property name="columns" value="1-12, 13-15, 16-20, 21-29" />
</bean>
Because the FixedLengthLineTokenizer uses the same LineTokenizer interface as
discussed earlier, it returns the same FieldSet as if a delimiter had been used. This
allows the same approaches to be used in handling its output, such as using the
BeanWrapperFieldSetMapper.
Supporting the preceding syntax for ranges requires that a specialized property editor,
RangeArrayPropertyEditor, be configured in the ApplicationContext. However, this bean
is automatically declared in an ApplicationContext where the batch namespace is used.
The following example shows how to define ranges for the FixedLengthLineTokenizer in
Java:
Java Configuration
@Bean
public FixedLengthTokenizer fixedLengthTokenizer() {
	FixedLengthTokenizer tokenizer = new FixedLengthTokenizer();
	tokenizer.setNames("ISIN", "Quantity", "Price", "Customer");
	tokenizer.setColumns(new Range(1, 12),
						new Range(13, 15),
						new Range(16, 20),
						new Range(21, 29));
	return tokenizer;
Because the FixedLengthLineTokenizer uses the same LineTokenizer interface as
discussed above, it returns the same FieldSet as if a delimiter had been used. This
lets the same approaches be used in handling its output, such as using the
BeanWrapperFieldSetMapper.
Multiple Record Types within a Single File

All of the file reading examples up to this point have all made a key assumption for
simplicity’s sake: all of the records in a file have the same format. However, this may
not always be the case. It is very common that a file might have records with different
formats that need to be tokenized differently and mapped to different objects. The
following excerpt from a file illustrates this:
USER;Smith;Peter;;T;20014539;F
LINEA;1044391041ABC037.49G201XX1383.12H
LINEB;2134776319DEF422.99M005LI
In this file we have three types of records, "USER", "LINEA", and "LINEB". A "USER" line
corresponds to a User object. "LINEA" and "LINEB" both correspond to Line objects,
though a "LINEA" has more information than a "LINEB".
The ItemReader reads each line individually, but we must specify different
LineTokenizer and FieldSetMapper objects so that the ItemWriter receives the
correct items. The PatternMatchingCompositeLineMapper makes this easy by allowing maps
of patterns to LineTokenizers and patterns to FieldSetMappers to be configured.
The following example shows how to define ranges for the FixedLengthLineTokenizer in
XML Configuration
<bean id="orderFileLineMapper"
      class="org.spr...PatternMatchingCompositeLineMapper">
    <property name="tokenizers">
            <entry key="USER*" value-ref="userTokenizer" />
            <entry key="LINEA*" value-ref="lineATokenizer" />
            <entry key="LINEB*" value-ref="lineBTokenizer" />
    </property>
    <property name="fieldSetMappers">
            <entry key="USER*" value-ref="userFieldSetMapper" />
            <entry key="LINE*" value-ref="lineFieldSetMapper" />
    </property>
</bean>
Java Configuration
@Bean
public PatternMatchingCompositeLineMapper orderFileLineMapper() {
	PatternMatchingCompositeLineMapper lineMapper =
		new PatternMatchingCompositeLineMapper();
	Map<String, LineTokenizer> tokenizers = new HashMap<>(3);
	tokenizers.put("USER*", userTokenizer());
	tokenizers.put("LINEA*", lineATokenizer());
	tokenizers.put("LINEB*", lineBTokenizer());
	lineMapper.setTokenizers(tokenizers);
	Map<String, FieldSetMapper> mappers = new HashMap<>(2);
	mappers.put("USER*", userFieldSetMapper());
	mappers.put("LINE*", lineFieldSetMapper());
	lineMapper.setFieldSetMappers(mappers);
	return lineMapper;
In this example, "LINEA" and "LINEB" have separate LineTokenizer instances, but they both use
the same FieldSetMapper.
The PatternMatchingCompositeLineMapper uses the PatternMatcher#match method
in order to select the correct delegate for each line. The PatternMatcher allows for
two wildcard characters with special meaning: the question mark ("?") matches exactly one
character, while the asterisk ("*") matches zero or more characters. Note that, in the
preceding configuration, all patterns end with an asterisk, making them effectively
prefixes  to lines. The PatternMatcher always matches the most specific pattern
possible, regardless of the order in the configuration. So if "LINE*" and "LINEA*" were
both listed as patterns, "LINEA" would match pattern "LINEA*", while "LINEB" would match
pattern "LINE*". Additionally, a single asterisk ("*") can serve as a default by matching
any line not matched by any other pattern.
The following example shows how to match a line not matched by any other pattern in XML:
XML Configuration
<entry key="*" value-ref="defaultLineTokenizer" />
The following example shows how to match a line not matched by any other pattern in Java:
Java Configuration
tokenizers.put("*", defaultLineTokenizer());
There is also a PatternMatchingCompositeLineTokenizer that can be used for tokenization
alone.
It is also common for a flat file to contain records that each span multiple lines. To
handle this situation, a more complex strategy is required. A demonstration of this
common pattern can be found in the multiLineRecords sample.
Exception Handling in Flat Files

There are many scenarios when tokenizing a line may cause exceptions to be thrown. Many
flat files are imperfect and contain incorrectly formatted records. Many users choose to
skip these erroneous lines while logging the issue, the original line, and the line
number. These logs can later be inspected manually or by another batch job. For this
reason, Spring Batch provides a hierarchy of exceptions for handling parse exceptions:
FlatFileParseException and FlatFileFormatException. FlatFileParseException is
thrown by the FlatFileItemReader when any errors are encountered while trying to read a
file. FlatFileFormatException is thrown by implementations of the LineTokenizer
interface and indicates a more specific error encountered while tokenizing.
IncorrectTokenCountException

Both DelimitedLineTokenizer and FixedLengthLineTokenizer have the ability to specify
column names that can be used for creating a FieldSet. However, if the number of column
names does not match the number of columns found while tokenizing a line, the FieldSet
cannot be created, and an IncorrectTokenCountException is thrown, which contains the
number of tokens encountered, and the number expected, as shown in the following example:
tokenizer.setNames(new String[] {"A", "B", "C", "D"});
try {
    tokenizer.tokenize("a,b,c");
catch (IncorrectTokenCountException e) {
    assertEquals(4, e.getExpectedCount());
    assertEquals(3, e.getActualCount());
IncorrectLineLengthException

Files formatted in a fixed-length format have additional requirements when parsing
because, unlike a delimited format, each column must strictly adhere to its predefined
width. If the total line length does not equal the widest value of this column, an
exception is thrown, as shown in the following example:
tokenizer.setColumns(new Range[] { new Range(1, 5),
                                   new Range(6, 10),
                                   new Range(11, 15) });
try {
    tokenizer.tokenize("12345");
    fail("Expected IncorrectLineLengthException");
catch (IncorrectLineLengthException ex) {
    assertEquals(15, ex.getExpectedLength());
    assertEquals(5, ex.getActualLength());
The configured ranges for the tokenizer above are: 1-5, 6-10, and 11-15. Consequently,
the total length of the line is 15. However, in the preceding example, a line of length 5
was passed in, causing an IncorrectLineLengthException to be thrown. Throwing an
exception here rather than only mapping the first column allows the processing of the
line to fail earlier and with more information than it would contain if it failed while
trying to read in column 2 in a FieldSetMapper. However, there are scenarios where the
length of the line is not always constant. For this reason, validation of line length can
be turned off via the 'strict' property, as shown in the following example:




    

tokenizer.setColumns(new Range[] { new Range(1, 5), new Range(6, 10) });
tokenizer.setStrict(false);
FieldSet tokens = tokenizer.tokenize("12345");
assertEquals("12345", tokens.readString(0));
assertEquals("", tokens.readString(1));
The preceding example is almost identical to the one before it, except that
tokenizer.setStrict(false) was called. This setting tells the tokenizer to not enforce
line lengths when tokenizing the line. A FieldSet is now correctly created and
returned. However, it contains only empty tokens for the remaining values.
6.5.3. FlatFileItemWriter

Writing out to flat files has the same problems and issues that reading in from a file
must overcome. A step must be able to write either delimited or fixed length formats in a
transactional manner.
LineAggregator

Just as the LineTokenizer interface is necessary to take an item and turn it into a
String, file writing must have a way to aggregate multiple fields into a single string
for writing to a file. In Spring Batch, this is the LineAggregator, shown in the
following interface definition:
public interface LineAggregator<T> {
    public String aggregate(T item);
The LineAggregator is the logical opposite of LineTokenizer.  LineTokenizer takes a
String and returns a FieldSet, whereas LineAggregator takes an item and returns a
String.
PassThroughLineAggregator

The most basic implementation of the LineAggregator interface is the
PassThroughLineAggregator, which assumes that the object is already a string or that
its string representation is acceptable for writing, as shown in the following code:
public class PassThroughLineAggregator<T> implements LineAggregator<T> {
    public String aggregate(T item) {
        return item.toString();
The preceding implementation is useful if direct control of creating the string is
required but the advantages of a FlatFileItemWriter, such as transaction and restart
support, are necessary.
Simplified File Writing Example

Now that the LineAggregator interface and its most basic implementation,
PassThroughLineAggregator, have been defined, the basic flow of writing can be
explained:
public void write(T item) throws Exception {
    write(lineAggregator.aggregate(item) + LINE_SEPARATOR);
<bean id="itemWriter" class="org.spr...FlatFileItemWriter">
    <property name="resource" value="file:target/test-outputs/output.txt" />
    <property name="lineAggregator">
        <bean class="org.spr...PassThroughLineAggregator"/>
    </property>
</bean>
In Java, a simple example of configuration might look like the following:
Java Configuration
@Bean
public FlatFileItemWriter itemWriter() {
	return  new FlatFileItemWriterBuilder<Foo>()
           			.name("itemWriter")
           			.resource(new FileSystemResource("target/test-outputs/output.txt"))
           			.lineAggregator(new PassThroughLineAggregator<>())
           			.build();
FieldExtractor

The preceding example may be useful for the most basic uses of a writing to a file.
However, most users of the FlatFileItemWriter have a domain object that needs to be
written out and, thus, must be converted into a line. In file reading, the following was
required:
Because there is no way for the framework to know which fields from the object need to
be written out, a FieldExtractor must be written to accomplish the task of turning the
item into an array, as shown in the following interface definition:
public interface FieldExtractor<T> {
    Object[] extract(T item);
Implementations of the FieldExtractor interface should create an array from the fields
of the provided object, which can then be written out with a delimiter between the
elements or as part of a fixed-width line.
PassThroughFieldExtractor

There are many cases where a collection, such as an array, Collection, or FieldSet,
needs to be written out. "Extracting" an array from one of these collection types is very
straightforward. To do so, convert the collection to an array. Therefore, the
PassThroughFieldExtractor should be used in this scenario. It should be noted that, if
the object passed in is not a type of collection, then the PassThroughFieldExtractor
returns an array containing solely the item to be extracted.
BeanWrapperFieldExtractor

As with the BeanWrapperFieldSetMapper described in the file reading section, it is
often preferable to configure how to convert a domain object to an object array, rather
than writing the conversion yourself. The BeanWrapperFieldExtractor provides this
functionality, as shown in the following example:
BeanWrapperFieldExtractor<Name> extractor = new BeanWrapperFieldExtractor<>();
extractor.setNames(new String[] { "first", "last", "born" });
String first = "Alan";
String last = "Turing";
int born = 1912;
Name n = new Name(first, last, born);
Object[] values = extractor.extract(n);
assertEquals(first, values[0]);
assertEquals(last, values[1]);
assertEquals(born, values[2]);
This extractor implementation has only one required property: the names of the fields to
map. Just as the BeanWrapperFieldSetMapper needs field names to map fields on the
FieldSet to setters on the provided object, the BeanWrapperFieldExtractor needs names
to map to getters for creating an object array. It is worth noting that the order of the
names determines the order of the fields within the array.
Delimited File Writing Example

The most basic flat file format is one in which all fields are separated by a delimiter.
This can be accomplished using a DelimitedLineAggregator. The following example writes
out a simple domain object that represents a credit to a customer account:
public class CustomerCredit {
    private int id;
    private String name;
    private BigDecimal credit;
    //getters and setters removed for clarity
Because a domain object is being used, an implementation of the FieldExtractor
interface must be provided, along with the delimiter to use.
The following example shows how to use the FieldExtractor with a delimiter in XML:
XML Configuration
<bean id="itemWriter" class="org.springframework.batch.item.file.FlatFileItemWriter">
    <property name="resource" ref="outputResource" />
    <property name="lineAggregator">
        <bean class="org.spr...DelimitedLineAggregator">
            <property name="delimiter" value=","/>
            <property name="fieldExtractor">
                <bean class="org.spr...BeanWrapperFieldExtractor">
                    <property name="names" value="name,credit"/>
                </bean>
            </property>
        </bean>
    </property>
</bean>
The following example shows how to use the FieldExtractor with a delimiter in Java:
Java Configuration
@Bean
public FlatFileItemWriter<CustomerCredit> itemWriter(Resource outputResource) throws Exception {
	BeanWrapperFieldExtractor<CustomerCredit> fieldExtractor = new BeanWrapperFieldExtractor<>();
	fieldExtractor.setNames(new String[] {"name", "credit"});
	fieldExtractor.afterPropertiesSet();
	DelimitedLineAggregator<CustomerCredit> lineAggregator = new DelimitedLineAggregator<>();
	lineAggregator.setDelimiter(",");
	lineAggregator.setFieldExtractor(fieldExtractor);
	return new FlatFileItemWriterBuilder<CustomerCredit>()
				.name("customerCreditWriter")
				.resource(outputResource)
				.lineAggregator(lineAggregator)
				.build();
In the previous example, the BeanWrapperFieldExtractor described earlier in this
chapter is used to turn the name and credit fields within CustomerCredit into an object
array, which is then written out with commas between each field.
It is also possible to use the FlatFileItemWriterBuilder.DelimitedBuilder to
automatically create the BeanWrapperFieldExtractor and DelimitedLineAggregator
as shown in the following example:
Java Configuration
@Bean
public FlatFileItemWriter<CustomerCredit> itemWriter(Resource outputResource) throws Exception {
	return new FlatFileItemWriterBuilder<CustomerCredit>()
				.name("customerCreditWriter")
				.resource(outputResource)
				.delimited()
				.delimiter("|")
				.names(new String[] {"name", "credit"})
				.build();
Fixed Width File Writing Example

Delimited is not the only type of flat file format. Many prefer to use a set width for
each column to delineate between fields, which is usually referred to as 'fixed width'.
Spring Batch supports this in file writing with the FormatterLineAggregator.
Using the same CustomerCredit domain object described above, it can be configured as
follows in XML:
XML Configuration
<bean id="itemWriter" class="org.springframework.batch.item.file.FlatFileItemWriter">
    <property name="resource" ref="outputResource" />
    <property name="lineAggregator">
        <bean class="org.spr...FormatterLineAggregator">
            <property name="fieldExtractor">
                <bean class="org.spr...BeanWrapperFieldExtractor">
                    <property name="names" value="name,credit" />
                </bean>
            </property>
            <property name="format" value="%-9s%-2.0f" />
        </bean>
    </property>
</bean>
Using the same CustomerCredit domain object described above, it can be configured as
follows in Java:
Java Configuration
@Bean
public FlatFileItemWriter<CustomerCredit> itemWriter(Resource outputResource) throws Exception {
	BeanWrapperFieldExtractor<CustomerCredit> fieldExtractor = new BeanWrapperFieldExtractor<>();
	fieldExtractor.setNames(new String[] {"name", "credit"});
	fieldExtractor.afterPropertiesSet();
	FormatterLineAggregator<CustomerCredit> lineAggregator = new FormatterLineAggregator<>();
	lineAggregator.setFormat("%-9s%-2.0f");
	lineAggregator.setFieldExtractor(fieldExtractor);
	return new FlatFileItemWriterBuilder<CustomerCredit>()
				.name("customerCreditWriter")
				.resource(outputResource)
				.lineAggregator(lineAggregator)
				.build();
Most of the preceding example should look familiar. However, the value of the format
property is new.
The following example shows the format property in XML:
<property name="format" value="%-9s%-2.0f" />
The following example shows the format property in Java:
FormatterLineAggregator<CustomerCredit> lineAggregator = new FormatterLineAggregator<>();
lineAggregator.setFormat("%-9s%-2.0f");
The underlying implementation is built using the same
Formatter added as part of Java 5. The Java
Formatter is based on the
printf functionality of the C programming
language. Most details on how to configure a formatter can be found in
the Javadoc of Formatter.
It is also possible to use the FlatFileItemWriterBuilder.FormattedBuilder to
automatically create the BeanWrapperFieldExtractor and FormatterLineAggregator
as shown in following example:
Java Configuration
@Bean
public FlatFileItemWriter<CustomerCredit> itemWriter(Resource outputResource) throws Exception {
	return new FlatFileItemWriterBuilder<CustomerCredit>()
				.name("customerCreditWriter")
				.resource(outputResource)
				.formatted()
				.format("%-9s%-2.0f")
				.names(new String[] {"name", "credit"})
				.build();
Handling File Creation

FlatFileItemReader has a very simple relationship with file resources. When the reader
is initialized, it opens the file (if it exists), and throws an exception if it does not.
File writing isn’t quite so simple. At first glance, it seems like a similar
straightforward contract should exist for FlatFileItemWriter: If the file already
exists, throw an exception, and, if it does not, create it and start writing. However,
potentially restarting a Job can cause issues. In normal restart scenarios, the
contract is reversed: If the file exists, start writing to it from the last known good
position, and, if it does not, throw an exception. However, what happens if the file name
for this job is always the same? In this case, you would want to delete the file if it
exists, unless it’s a restart. Because of this possibility, the FlatFileItemWriter
contains the property, shouldDeleteIfExists. Setting this property to true causes an
existing file with the same name to be deleted when the writer is opened.
6.6. XML Item Readers and Writers

Spring Batch provides transactional infrastructure for both reading XML records and
mapping them to Java objects as well as writing Java objects as XML records.
Constraints on streaming XML
The StAX API is used for I/O, as other standard XML parsing APIs do not fit batch
processing requirements (DOM loads the whole input into memory at once and SAX controls
the parsing process by allowing the user to provide only callbacks).
We need to consider how XML input and output works in Spring Batch. First, there are a
few concepts that vary from file reading and writing but are common across Spring Batch
XML processing. With XML processing, instead of lines of records (FieldSet instances) that need
to be tokenized, it is assumed an XML resource is a collection of 'fragments'
corresponding to individual records, as shown in the following image:
The 'trade' tag is defined as the 'root element' in the scenario above. Everything
between '<trade>' and '</trade>' is considered one 'fragment'. Spring Batch
uses Object/XML Mapping (OXM) to bind fragments to objects. However, Spring Batch is not
tied to any particular XML binding technology. Typical use is to delegate to
Spring OXM, which
provides uniform abstraction for the most popular OXM technologies. The dependency on
Spring OXM is optional and you can choose to implement Spring Batch specific interfaces
if desired. The relationship to the technologies that OXM supports is shown in the
following image:
With an introduction to OXM and how one can use XML fragments to represent records, we
can now more closely examine readers and writers.
6.6.1. StaxEventItemReader

The StaxEventItemReader configuration provides a typical setup for the processing of
records from an XML input stream. First, consider the following set of XML records that
the StaxEventItemReader can process:
<?xml version="1.0" encoding="UTF-8"?>
<records>
    <trade xmlns="https://springframework.org/batch/sample/io/oxm/domain">
        <isin>XYZ0001</isin>
        <quantity>5</quantity>
        <price>11.39</price>
        <customer>Customer1</customer>
    </trade>
    <trade xmlns="https://springframework.org/batch/sample/io/oxm/domain">
        <isin>XYZ0002</isin>
        <quantity>2</quantity>
        <price>72.99</price>
        <customer>Customer2c</customer>
    </trade>
    <trade xmlns="https://springframework.org/batch/sample/io/oxm/domain">
        <isin>XYZ0003</isin>
        <quantity>9</quantity>
        <price>99.99</price>
        <customer>Customer3</customer>
    </trade>
</records>
To be able to process the XML records, the following is needed:
Root Element Name: The name of the root element of the fragment that constitutes the
object to be mapped. The example configuration demonstrates this with the value of trade.
Resource: A Spring Resource that represents the file to read.
Unmarshaller: An unmarshalling facility provided by Spring OXM for mapping the XML
fragment to an object.
The following example shows how to define a StaxEventItemReader that works with a root
element named trade, a resource of data/iosample/input/input.xml, and an unmarshaller
called tradeMarshaller in XML:
XML Configuration
<bean id="itemReader" class="org.springframework.batch.item.xml.StaxEventItemReader">
    <property name="fragmentRootElementName" value="trade" />
    <property name="resource" value="org/springframework/batch/item/xml/domain/trades.xml" />
    <property name="unmarshaller" ref="tradeMarshaller" />
</bean>
The following example shows how to define a StaxEventItemReader that works with a root
element named trade, a resource of data/iosample/input/input.xml, and an unmarshaller
called tradeMarshaller in Java:
Java Configuration
@Bean
public StaxEventItemReader itemReader() {
	return new StaxEventItemReaderBuilder<Trade>()
			.name("itemReader")
			.resource(new FileSystemResource("org/springframework/batch/item/xml/domain/trades.xml"))
			.addFragmentRootElements("trade")
			.unmarshaller(tradeMarshaller())
			.build();
Note that, in this example, we have chosen to use an XStreamMarshaller, which accepts
an alias passed in as a map with the first key and value being the name of the fragment
(that is, a root element) and the object type to bind. Then, similar to a FieldSet, the
names of the other elements that map to fields within the object type are described as
key/value pairs in the map. In the configuration file, we can use a Spring configuration
utility to describe the required alias.
The following example shows how to describe the alias in XML:
XML Configuration
<bean id="tradeMarshaller"
      class="org.springframework.oxm.xstream.XStreamMarshaller">
    <property name="aliases">
        <util:map id="aliases">
            <entry key="trade"
                   value="org.springframework.batch.sample.domain.trade.Trade" />
            <entry key="price" value="java.math.BigDecimal" />
            <entry key="isin" value="java.lang.String" />
            <entry key="customer" value="java.lang.String" />
            <entry key="quantity" value="java.lang.Long" />
        </util:map>
    </property>
</bean>
The following example shows how to describe the alias in Java:
Java Configuration
@Bean
public XStreamMarshaller tradeMarshaller() {
	Map<String, Class> aliases = new HashMap<>();
	aliases.put("trade", Trade.class);
	aliases.put("price", BigDecimal.class);
	aliases.put("isin", String.class);
	aliases.put("customer", String.class);
	aliases.put("quantity", Long.class);
	XStreamMarshaller marshaller = new XStreamMarshaller();
	marshaller.setAliases(aliases);
	return marshaller;
On input, the reader reads the XML resource until it recognizes that a new fragment is
about to start. By default, the reader matches the element name to recognize that a new
fragment is about to start. The reader creates a standalone XML document from the
fragment and passes the document to a deserializer (typically a wrapper around a Spring
OXM Unmarshaller) to map the XML to a Java object.
In summary, this procedure is analogous to the following Java code, which uses the
injection provided by the Spring configuration:
StaxEventItemReader<Trade> xmlStaxEventItemReader = new StaxEventItemReader<>();
Resource resource = new ByteArrayResource(xmlResource.getBytes());
Map aliases = new HashMap();
aliases.put("trade","org.springframework.batch.sample.domain.trade.Trade");
aliases.put("price","java.math.BigDecimal");
aliases.put("customer","java.lang.String");
aliases.put("isin","java.lang.String");
aliases.put("quantity","java.lang.Long");
XStreamMarshaller unmarshaller = new XStreamMarshaller();
unmarshaller.setAliases(aliases);
xmlStaxEventItemReader.setUnmarshaller(unmarshaller);
xmlStaxEventItemReader.setResource(resource);
xmlStaxEventItemReader.setFragmentRootElementName("trade");
xmlStaxEventItemReader.open(new ExecutionContext());
boolean hasNext = true;
Trade trade = null;
while (hasNext) {
    trade = xmlStaxEventItemReader.read();
    if (trade == null) {
        hasNext = false;
    else {
        System.out.println(trade);