Wednesday, April 18, 2012

Apache Tika - File Content Analyser

If you are fiddling around with files and documents day-in day-out and look to detect their content type for whatsoever reason - Apache Tika can save your day.

A valid large enterprise use-case I can think of.....

Require a system/solution to read through few hundreds of thousands of documents of a few hundred distinct MIME types every day - validate - process - and push to subscribed interfaces via various protocols.

From the architecture/design perspective, I would like my solution to have extensibility rolled in - This would ensure that I can have a single file reader which can handle/read multiple file formats so that I don't have to spend time & effort on IT for incorporating a new file format in near future. Also prudent would be to have the file formats validated before consuming/reading them by way of pre-processors which would be responsible for detecting the content-types before reading. This would really save a lot of time and gain performance.

Consider a scenario where a huge video file is morphed into a PDF by just altering the extension. It would not be very wise for the IT solution to just read the file extension and presume the validity of file format.

Apache Tika does exactly that - It has the intelligence to detect the exact MIME type of the file just by reading the meta data and remember it is not very invasive (It doesn't read the entire content of a 5 GB file to understand the actual content type). In its own words this open source project can be described as - "detects and extracts metadata and structured text content from various documents using existing parser libraries" Downside - It is a bit heavy (~25 MB library). But the benefit is that it can read, detect and parse a wide variety of file contents and provide with a specific MIME type which apparently is not the case with many of the java APIs out there.

More information on APIs and Documentation here


I happen to use this open source library in one of my implementation and found really helpful. Thought it might help someone out there looking for such requirements :)

Tuesday, April 17, 2012

BPEL ora:countNodes() XPath function usage

Occasionally I have hit the problem where BPEL engine fails to execute the countNodes() function reporting the following error;

javax.xml.xpath.XPathExpressionException: Failed to execute countNodes() function : oracle.xml.parser.v2.XMLNodeList
        at oracle.xml.xpath.JXPathExpression.evaluate(JXPathExpression.java:242)
        at com.collaxa.cube.xml.xpath.BPELXPathUtil.evaluate(BPELXPathUtil.java:247)

Since the BPEL XPath expression builder allows declarative development, it is very tempting and easy to pick the relevant function and insert the BPEL variable from the panels for evaluation. However, for a function such as ora:countNodes(), the catch is that it doesn't expect bpws:getVariableData(blah blah blah) as an argument. Rather, the XPath expression of the variable has to be directly fed into it for proper evaluation.

Apparently, the BPEL XPath function documentation for ora:countNodes() states the following definition;

 ora:countNodes(variableName, partName?, locationPath?)

Hence,  ora:countNodes(bpws:getVariableData(variableName, partName?, locationPath?)) will eventually throw the above error. It is a very simple thing, but can easily miss the unsuspecting eyes :)

Thursday, April 12, 2012

SOA Suite 11.1.1.6 (PS5) Released !!

The Oracle SOA Suite 11.1.1.6 (PS5) is now generally available and the binaries can be downloaded from the OTN.

This release is promising and has provided better features in comparison to the previous PS4 release. Listed below are a few cool features which I found really helpful and see here for the complete list of new features available in this release.

BPEL:
  • New delivery/persistence policy for BPEL processes (aysnc.persist, async.cache & sync)
  • Similar to the 'Skip Condition's that we were used to in the BPM user assignments, skip conditions through XPath queries make way to skip activity executions in BPEL processes which can largely remove 'switch-case' design in most scenarios
  • Assertions are enabled in the request-response activities such as receive, invoke, reply etc.. where XPath conditions can be asserted to throw faults when they fail. These can be asserted before/after invocation namely pre-assert and post-assert
Human Task:
  • With regards to human-task, the much awaited feature of dynamic-assignment & task escalation can now be performed from the user task UI as provision is enabled for exposing this dialog in the user interface
Business Rules:
  • Business Rules now add capability to define rules in normal programming paradigm if-then-else, elseif, do-while, while etc.. which are greatly expected. This comes in as a great boon as earlier releases didn't offer users to define else conditions which had to be handled tactically using 'if' ONLY conditions which was a challenge in many complex scenarios
  • Usability and performance has greatly improved for editing business rules from within the composer
Business Activity Monitoring (BAM):
  • BAM is now supported on IBM Websphere
  • BAM reports based on external data objects can now be refreshed via Timer which removes the need for reload of entire report/view
  • During B2B-BAM integration scenarios we are often faced with error handling challenges at the EMS (Enterprise Messaging Service) layer which is now offered as part of this release
DB, File/FTP Adapters:
  • Optimizations for DB and File/FTP adapters such as coherence integration claims to improve the performance greatly especially during large payload interactions
Enterprise Manager (EM):
  • Alerts are displayed in the EM console for messages that are stuck in asynchronous transactions which can be recovered from within the console
  • Enhanced composite instance search provided for better search and usability for IT administrators

Wednesday, April 11, 2012

Oracle B2B Inbound Outbound Testing

I just thought that I would note down the steps for performing the B2B inbound and outbound transaction testing using the Oracle B2B gateway.

For inbound transactions, perform the following;

1. Create a host trading partner
2. Create a external trading partner
3. Go to Administration and create a listening channel - Note that the listening channel configured here (For example a folder in the file system) is meant for inbound transactions where Oracle B2B would listen and pickup the files
4. Create a trading partner agreement at the external trading partner level by providing the document definition and channel details
5. After all these configurations, place a file (valid against the document definition) on the listening channel which would be picked up by the B2B engine and would be sent to the trading partner (depending on the contract established under the trading partner agreement) on appropriate channel. Important to note here that the file should have the following naming convention;

<fromtradingpartner>_<uniquenumber/timestamp>.<anyfileextn>

For testing outbound transactions;

1. Configure the channel details under the trading partner level (Channel tab). This would be the channel where the resulting file would be sent/written to
2. Disable the 'Translate' option under the trading partner agreement as we are not intending to translate the format of the file for simulating/testing outbound transactions. This option can be enabled if we are processing the file contents through the SOA layer
3. Go to Administration -> Listening Channel -> create a separate listening channel where you want the file to be read for outbound and check the 'Internal' option under the 'Channel Attributes' section - This will ensure and notify B2B gateway that the channel is configured for outbound transactions
4. Ensure that the file naming conventions for testing outbound transactions are as follows;

<totradingpartner>_<doctype>_<doctyperevision>_<msgtype>_<msgid>.<anyfileextn>

5. Now, place the sample file with the above naming conventions on the listening channel configured above (step 3) which would simulate the outbound transactions and write/send the file to the trading partner channel