Process Flow designing
- Nitin Beri (Unlicensed)
- Avinash Kumar
Owned by Nitin Beri (Unlicensed)
- Performance considerations based on different use cases
- Files and Bulk Data processing
- Use cases – Business data exchange with customers and partners (B2B), Data Integration (ETL), Self-service for business users
- Technical needs – Lots of file transfers, Polling Events and Triggers, Secure Data Transfers, Complex data transformation rules, Data validations, Error notifications, Ability to fix errors and re-run files
- Performance considerations – Ability to process very large files (more than available memory), Apply complex rules efficiently, Ability to queue jobs, Efficient handling of events, Detailed logs for analysis, Recover failed jobs, Responsive UI
- Real time Transactions
- Use cases – APIs and Web Services, Application Integration / ESB
- Technical needs – Published Web Services, Real-time Events, Sub-second response times, Simpler data transformation rules, SLAs, Notifications
- Performance considerations – Ability to process transactional data quickly, Handle large volume of concurrent jobs, Detailed logs not needed, Ability to recover jobs not needed
Files and bulk data
- Attributes that impact performance and throughput per instance of Process Flow
- Volume of Bulk Data
- # of Records, # of Fields per Record, # of Megabytes
- Volume of Bulk Data
- Complexity of Data Mapping Rules
- Straight Maps vs. Complex Maps
- # of Fields that have Complex Maps
- Complexity of Data Validations
- Calling External Programs Inside the Mapping Rules
- # of Database Lookups
- Source data encrypted or compressed
- Design approaches for optimizing performance of files and bulk data
- Use dynamic process flows – One process flow can handle multiple, different sources/file types etc.
- Use fewer events to minimize polling, e.g. Watch parent folder with one event rather than sub-folders
- Large File Data Ingestion (Streaming)
- Design process flow to use streaming and mark data mapping to run in streaming mode
- File Splitting in Data Mapping
- Ability to divide bulk data into multiple blocks of X number of records each
- Parallel processing of transformations/mappings of N number of blocks separately, resulting data set is automatically concatenated into a single stream
- This is done automatically by Adeptia, user just specifies values for X and N
- For Example: A 400MB text or XML file with half a million records can be divided into 100 blocks of 5,000 records (X) each. Ten blocks (N) can be processed in parallel at a time.
- In mapping, cache DB lookup results
- When loading millions of records into target DB, instead of Adv. DB schema, generate target file and use DB bulk file load in a custom plugin
- Set PF log level to ERROR to reduce logging
Real-time Transactions
- Design approaches for optimizing performance of APIs, Web Services and Real-time transactions
- Enable JDO Caching to ensure PF objects are not retrieved from back-end DB for each instance of PF run
- Reduce the number of activities and steps in flow by using custom plugins
- Select “Optimize for Real-Time” option in the PF properties to ensure minimum logging and disable PF recovery
- Select process flow Priority to be IMMEDIATE in the PF properties so when triggered it bypasses the queue
- Ensure that DB connection pool settings are aligned to support concurrent transaction volumes
- For simpler data processing scenarios, use custom plugins for parsing data and for data mapping/transformation for more efficient processing rather than using Schema feature and Data Mapper to reduce unnecessary overhead
You may be interested in...