Data Pipeline Acceleration for VaR / On-Demand Risk Management:  

Apache Spark was chosen as the distributed computation framework to implement VaR solution over the Hadoop file system. We are developing the application in Scala to get the maximum utilization of subwaymulti-core CPUs. We store Reference Data for the current date as a cached RDD so that it can be used for multiple runs that could be launched by different users. We use in-memory file system from Apache Ignite and store it as a shared RDD. We also cache the scenario files generated for previous runs. We store any scenario files generated for other dates on HDFS in Parquet format. We will broadcast position data to all the nodes. By doing this, we aim to reduce the network I/O for lookups and joins on the position data.

5 Independence Way   Princeton, NJ 08540   info@airisdata.com   609.281.5030   Careers   Blog   Contact Us
Copyright © 2016. airis.DATA. All Rights Reserved.

Parquet, Avro, Kafka, Apache Hadoop, Apache Spark and the Apache Spark Logo are trademarks of the Apache Software Foundation.