Thursday, December 22, 2016

SOA 12c: Process Large Files Using Oracle MFT & File Adapter Chunked Read Option

SOA 12c adds a new ChunkedRead operation to the JCA File Adapter. Prior to this, users had to use a SynchRead operation and then edit the JCA file to achieve a "chunked read". In this blog, I will attempt to explain how to process a large file in chunks using the SOA File Adapter and some best practices around it. One of the major advantages of chunking large files is that it reduces the amount of data that is loaded in memory and makes efficient use of the translator resources.

"File Processing" means, reading, parsing and translating the file contents. If you just want to move/transfer a file consider using MFT for best performance, efficiency & scalability.

In this example, MFT gets a large customer records file from a remote SFTP location and sends it to SOA layer for further processing. MFT configuration is pretty straight-forward and is out of scope in this entry. For more info on Oracle MFT read here.

SOA 12c offers tight integration with Oracle MFT through the simple to use MFT adapter. If the MFT adapter is configured as a service, MFT can directly pass the file either inline or as a reference to the SOA process. If configured as a reference, it enables a SOA process to leverage MFT to transfer a file.

MFT also provides a bunch of useful file metadata info (target file name, directory, file size etc..) as part of the MFT header SOAP request.
Create a File Adapter:

Drag & drop a File adapter to the external references swimlane of our SOA composite. Follow instructions in the wizard to complete the configuration as shown below. Ensure that you choose the "Chunked Read" operation and define a chunk size - This will be the number of records in the file that will be read in each iteration. For eg., if you have 500 records with a chunk size of 100, the adapter would read the file in 5 chunks.







You will have to create an NXSD schema which can be generated with the sample flat file. The file adapter uses the NXSD to read the flat file and also convert it into XML format.

Implementing the BPEL Process:

Now, create a BPEL process using the BPEL 2.0 specification [This is the default option].
As a best practice, ensure the BPEL process is asynchronous - this will ensure that the "long running" BPEL process doesn't hog threads.

In this case, since we are receiving a file from MFT, we will choose "No Service" template to create a BPEL process with no interface. We will define this interface later with the MFT adapter.


Create MFT Adapter:

Drag and drop an MFT adapter to the "Exposed Services" swimlane of your SOA composite application, provide a name and choose "Service". Now, wire the MFT Adapter service and File Adapter reference to the BPEL process we created. Your SOA composite should look like below;



Processing large file in chunks:

In order to process the file in chunks, the BPEL process invoke that triggers the File Adapter must be placed within a while loop. During each iteration, the file adapter uses the property header values to determine where to start reading.

At a minimum, the following are the JCA adapter properties that must be set;

jca.file.FileName : Send/Receive file name. This property overrides the adapter configuration. Very handy property to set / get dynamic file names
jca.file.Directory : Send/Receive directory location. This property overrides the adapter configuration
jca.file.LineNumber : Set/Get line number from which the file adapter must start processing the native file
jca.file.ColumnNumber : Set/Get column number from which the file adapter must start processing the native file
jca.file.IsEOF : File adapter returns this property to indicate whether end-of-file has been reached or not

Apart from the above, there are 3 other properties that helps with error management & exception handling.

jca.file.IsMessageRejected : Returned by the file adapter if a message is rejected (non-conformance to the schema/not well formed)
jca.file.RejectionReason : Returned by the file adapter in conjunction with the above property. Reason for the message rejection
jca.file.NoDataFound : Returned by the file adapter if no data is found to be read

In the BPEL process "Invoke" activity, only jca.file.FileName and jca.file.Directory properies are available to choose from the properties tab. We will have to configure the other properties manually.

First, let's create a bunch of BPEL variables to hold these properties. For simplicity, just create all variables with a simple XSD string type.

Let's now configure the file adapter properties.

For input, we must first send filename, directory, line number and column number to the file adapter, so the first chunked read can happen. From the return properties (output), we will receive the new line number, column number, end-of-file properties which can be fed back to the adapter within a while loop.

Click on the "source" tab in the BPEL process and configure the following properties. Syntax shown below is for BPEL 2.0 spec, since we built the BPEL process based on BPEL 2.0.

Note: In BPEL 1.1 specification, the syntax was bpelx:inputProperties & bpelx:outputProperties.
Drag & drop an assign activity before the while loop to initialize the variables for the first time the file is read (first chunk) - since we know the first chunk of data will start at line 1 and column 1.

lineNumber -> 1
columnNumber -> 1
isEOF -> 'false'

For the while loop condition, the file adapter must be invoked until end-of-file is reached, enter the following loop condition;

Within the while loop, drag & drop another assign activity to re-assign file adapter properties.

returnIsEOF -> isEOF
returnLineNumber -> lineNumber
returnColumnNumber -> columnNumber

This will ensure that the in the next loop, file adapter would start fetching records from the previous end. For eg., If you have a file with 500 records with a chunk value of 100, returnLineNumber will have a value of 101 after the first loop is completed. This will ensure the file adapter starts reading the file from line number 101 instead of starting over.

Your BPEL process must look like this;

We now have the BPEL process that receives file reference from MFT, reads the large file in chunks.

Further processing like data shaping, transformation can be done from within the while loop.

4 comments:

  1. Hi Sathya, Thanks for the good post with screenshots. I have a relatable question where I couldn't find answer yet. Question is:
    "When chunk read is implemented, I see that weblogic server creates a reference in server so that re-reading the same file name will not happen but can we delete those files from Weblogic? Does it have it's own cycle to delete these references? This is essentially an issue in case of large file transfers around 1 GB, particularly in SOA 11.1.1.6

    ReplyDelete
  2. Hi
    I have a file where I need to get the elements count and value of ref from file.

    In file we have 3layers I need to ignore 2layers and work on 3rd layer where i get elements.
    1st layer isa
    2nd layer (gt
    3rd layer. (st(ins
    This is 1element N1
    Ref)
    This is 2element(ins
    Nm
    Dob...
    Ref)
    Se)
    Ge)
    Iea

    ReplyDelete
  3. m not getting plz tell me in easy way...

    ReplyDelete
  4. Hi Sathyam,

    I am trying same pattern, But line number & column number is same for every iteration. Can you please help me here. Here is my email 'sri.saileshkamma@gmail.com'. Let me know i can send my project.

    ReplyDelete