Defining project requirements, building SQL/HIVE databases, job automation, using analytical tools to find trends and predictive characteristics, developing and automating reports, data visualization, and exploring new analytical and big data tools.
The candidate must possess clear communicative skills to work in a highly collaborative and fast-paced team environment.
A successful candidate must have the ability to understand complex business problems to ensure projects are leveraging the appropriate technology and analytical tools in the delivery of a comprehensive solution.
Bachelor’s Degree in Computer Science, Engineering, Mathematics, Statistics or a related field
1+ year of experience in a data-oriented role working in a multi-disciplinary team
Master’s Degree in Computer Science, Engineering, Mathematics, Statistics or a related field
Have familiarity and experience with tools like Python, Hadoop, TensorFlow, scikit-learn, SQL, d3.js, map-reduce, Tableau or R
Previous experience in working with and building for location aware services and familiarity with web service development and API integration with multiple systems
Have a portfolio of open source contributions, personal projects, presentations, or other things that show that you are passionate about a subject, took some initiative to learn about it, and applied it to a real problem
Design, build, optimize, launch and support new and existing data models and ETL processes in production
Interface with engineers, product managers and product analysts to understand data needs.
Manage and verify data accuracy for Hadoop cluster.
Responsible for support of Hadoop cluster environment which includes Hive, Spark, Hbase, Presto, etc.
BS degree or equivalent experience in Computer Science or related field
2+ years experience in custom ETL design, implementation and maintenance on Hadoop clusters
2+ years on hand-on development coding
Understanding of Hadoop ecosystem such as HDFS, YARN, MapReduce, Zookeeper, Kafka, HBase, Spark and Hive
Strong SQL skills, especially in the area of data aggregation
Good understanding of distributed system, basic mathematics such as statistics and probability
Comfortable with Git version control
At least 2 years’ experience of architecture and design infrastructure on AWS
Experience building real-world data pipelines
Automation skills such as Airflow, Python and Bash code