PlumResearch is looking for a Data Processing Engineer who can help us maintain existing systems, and develop new and effective solutions for processing our Big Data.
We’re looking for an engineer whose primary goal will be working closely with the current team and management to support and develop our existing system of gathering and processing Big Data. You’ll be joining a growing team that is excited about technology, scalability, and sharing our extensive knowledge with you. We’re particularly interested in engineers who have a talent for developing unconventional, fully-operational solutions. A successful candidate has strong analytical and programming skills alongside the ability to diagnose and solve functional and performance problems.
Job requirements
On a day-to-day basis you will:
- maintain and improve existing pipelines, add new data processing pipelines in Airflow
- optimize pipelines for handling large amounts of data
- actively participate in choosing methods for solving any technological problems
- Python 3 (Pandas)
- MySQL or PostgreSQL (including query optimization)
- Cassandra
- Airflow
- Knowledge of large data set processing (starting from GBs to TBs)
- 3–4 years of experience in a similar role
- Ability to write modular and testable code
- Experience with Unit tests and Linux
- Communicative spoken and written English
- Kubernetes / Docker
- Jenkins CI / Gitlab CI
- ClickHouse
- Designing and building data pipelines using any of (but not limited to) the following: Kafka, Spark, NoSQL, Hadoop/HDFS/Hive, Flink, Druid or their cloud alternatives