Blink and it’s another month gone. But Oracle has been very very busy as we come around for the latest quarterly updates and the details published. Plus a lot about Oracle Hospitality Integration Hub (OHIP) which builds upon OIC.
OIC has for some time now provided an FTP adaptor and more recently included a full FTP server capability. But both have limits on the file size and capacity. The file constraints (1GB for a file and 500GB for the FTP server) shouldn’t be an issue for day to day activities. But OIC is often used to support SaaS Financials and other cloud solutions which do have monthly process cycles which can generate significant data volumes, for example, payroll data. The question is how to handle such data with such constraints?
There is a school of thought that points to the possibility when handling such large data volumes we should consider using Data Integration rather than a more event-centric integration tool. Personally, I think there is a lot of validity in the argument, and anyone dealing with such bulky data activities should review and question if it is a better answer.
That said, there are cases where it does stand-up. For example:
- If an organization is transitioning to a more event-driven or at least micro-batch model, you have to start the transition somewhere, but trying to line up changes everywhere can be problematic, so we have to start somewhere. Building an integration process so you have an event model developed, but in the interim, you need to take that bulk mechanism and convert it to a small stream of events.
- You may be working with a bulk data extract and only need a small subset of the data provided, it won’t help if the data is also represented using a verbose notation such as XML.
How to overcome the constraint? Oracle databases aren’t so constrained, and SQLLoader can provide an easy means to ingest the data into a staging table. The benefit of this is:
- if you’re only needing a subset of the data you can pull just those columns from the table.
- the bulk of the use of XML to be self-describing can be shed through using the DB schema as being more prescriptive.
- SQL scripts can handle the checksum records removing that data and overhead from the integration process, leaving you to concentrate more on the business process.
If the data is still substantial once in the database there are a number of strategies to consume the data in more manageable chunks, such as
- Running SQL script on the database that takes each row and calls the OIC as a restful API point. This approach is potentially very interesting as it may then mean if you’re moving towards an event process in the future the API endpoint represents the future state and the database stored procedure is mimicking the future client behaviour.
- Use polling strategies and result set limits to control how much data is processed in a single execution of the integration. This approach does mean the integration needs to tag which records have been processed to avoid re-reading them.
Enriching an integration from data in a database or DBaaS (Database as a Service) is not an unusual requirement. Many integration use cases today need to access a database that is on-premises. The means to connect to the database is fairly obvious – the connection agent. Our book goes into a lot more detail as to why that is, and the implications of using database connections.
However when it comes to Oracle’s DBaaS a service it would be very easy to assume that given that you’re using two different parts of Oracle’s PaaS that it would be straight forward to connect the two together without an agent. However, at least today whether its on-premises or DBaaS you need to use a connection agent. This does mean that you’ll need an IaaS node to host the connection agent.
This quirk is driven by the fact that there are some scenarios that this does actually make sense. For example – the Oracle domains need to have a high level of isolation, so when the DBaaS is in another domain then the decoupling via the agent makes sense. When your database is in a different zone of the cloud – then you’re running DB calls across what is effectively a wide are network – not good.