Performance considerations based on different use cases
Files and Bulk Data processing
Use cases – Business data exchange with customers and partners (B2B), Data Integration (ETL), Self-service for business users
Technical needs – Lots of file transfers, Polling Events and Triggers, Secure Data Transfers, Complex data transformation rules, Data validations, Error notifications, Ability to fix errors and re-run files
Performance considerations – Ability to process very large files (more than available memory), Apply complex rules efficiently, Ability to queue jobs, Efficient handling of events, Detailed logs for analysis, Recover failed jobs, Responsive UI
Real time Transactions
Use cases – APIs and Web Services, Application Integration / ESB
Technical needs – Published Web Services, Real-time Events, Sub-second response times, Simpler data transformation rules, SLAs, Notifications
Performance considerations – Ability to process transactional data quickly, Handle large volume of concurrent jobs, Detailed logs not needed, Ability to recover jobs not needed
Files & Bulk Data
Attributes that impact performance and throughput per instance of Process Flow
Volume of Bulk Data
# of Records, # of Fields per Record, # of Megabytes
Complexity of Data Mapping Rules
Straight Maps vs. Complex Maps
# of Fields that have Complex Maps
Complexity of Data Validations
Calling External Programs Inside the Mapping Rules
# of Database Lookups
Source data encrypted or compressed
Design approaches for optimizing performance of files and bulk data
Use dynamic process flows – One process flow can handle multiple, different sources/file types etc.
Use fewer events to minimize polling, e.g. Watch parent folder with one event rather than sub-folders
Large File Data Ingestion (Streaming)
Design process flow to use streaming and mark data mapping to run in streaming mode
File Splitting in Data Mapping
Ability to divide bulk data into multiple blocks of X number of records each
Parallel processing of transformations/mappings of N number of blocks separately, resulting data set is automatically concatenated into a single stream
This is done automatically by Adeptia, user just specifies values for X and N
For Example: A 400MB text or XML file with half a million records can be divided into 100 blocks of 5,000 records (X) each. Ten blocks (N) can be processed in parallel at a time.
In mapping, cache DB lookup results
When loading millions of records into target DB, instead of Adv. DB schema, generate target file and use DB bulk file load in a custom plugin
Set PF log level to ERROR to reduce logging
Real Time Transactions
Design approaches for optimizing performance of APIs, Web Services and Real-time transactions
Enable JDO Caching to ensure PF objects are not retrieved from back-end DB for each instance of PF run
Reduce the number of activities and steps in flow by using custom plugins
Select “Optimize for Real-Time” option in the PF properties to ensure minimum logging and disable PF recovery
Select process flow Priority to be IMMEDIATE in the PF properties so when triggered it bypasses the queue
Ensure that DB connection pool settings are aligned to support concurrent transaction volumes
For simpler data processing scenarios, use custom plugins for parsing data and for data mapping/transformation for more efficient processing rather than using Schema feature and Data Mapper to reduce unnecessary overhead