Data Engineer and Data Science Internships with airisDATA in Princeton, NJ
The airisDATA Data Science internship program will provide candidates with the chance to work with our teams of data scientists, data engineers, architectures and software engineers on challenging problems running on our state of the art Spark cluster. You will work on real problems as part of a passionate team.
You will evaluate open source and internally-developed modeling and analytics tools that could be developed in Python, Java or Scala. You will have the opportunity to work with Spark, H20, DeepLearning, SystemML, Redis, Hadoop, HBase, TitanDB, Cassandra, GraphX, SparkML, Apache Ignite, Data Flow, NiFi and other NoSQL and Big Data tools.
You will share your results and insights with data science teams via external blog, Github., our meetup and for internal use. You will do presentations at our NJ Data Science Meetup at Princeton University.
You will write tools, wrappers, and scripts to help teammates perform their jobs more efficiently and effectively. Example – build a standard pipeline that does grid-search model parameter estimation and then flows straight into a series of diagnostics about how well the model that was just produced performs.
You will write software to clean and investigate large, messy data sets of numerical and textual data. You will work with real data sets as well as generated and open source data sets.
You will integrate with external data sources and APIs to discover interesting trends and Kaggle problems.
You will design rich data visualizations to communicate complex ideas to customers and company leadership
You will work real-world problems from clients and internal use cases.
We are looking for candidates for our Princeton office that are smart, curious, knowledge of some data science, experience with some of the following: Python, Java, Spark, SQL, Ruby, Chef, Puppet, Hadoop and Linux. You should have some knowledge of SQL and NoSQL databases. You have hand on experience and want to work and solve problems hands on. You must be interested in solving hard problems with big data and new tools. We are constantly using the latest tools from Stanford, MIT and open source.
You should have completed or are near completion on a Masters of PHD in a quantitative field of study, computer science or math. You must have some experience programming and doing some machine learning.
Knowledge of statistical algorithms like Bayesian, Decision Trees, Random Forest, Neural networks, classifiers, clustering and regression.
For Data Engineering, the should have stronger programming skills, knowledge of data structures, algorithms and don’t need machine learning.
These would be full-time positions in our Princeton New Jersey office.