- Building big data pipelines for batch and real-time data processing ;
- Developing, loading and running predictive models in machine learning platforms;
- Building and running data analytics environments;
- Analyzing existing ETLs and design new solution based on Spark ; Collect, process and cleanse data from a wide variety of sources. Transform and convert unstructured data set into structured data for algorithm input;
- Evaluate the effectiveness of user experiences, determining what data is needed and how to collect it; integrate ML models into production;
- Design and use validation tools to compare results of original and new solution;
- Building and maintaining a Hadoop or Spark cluster, together with the many other tools that are part of the ecosystem: databases (such as Hive and HBase), streaming data platforms (Kafka, Spark Streaming etc)
- Java/Scala/Python;
- Apache Spark (Core, Streaming, SQL);
- Apache Hadoop Hive, HBase, HDFS;
- CI/CD (Docker, Jenkins, Git);
- Various machine learning tools;
- SQL and noSQL DBs;
- Some REST API web-services;
- Some Kafka, IBM MQ;
- Linux
- Possibility to be involved in international projects;
- Benefits package (health care, multisport, lunch tickets, petrol vouchers and shopping vouchers, etc.);
- Career development center;
- Free English classes with native speakers (certified English teachers);
- Vast opportunities for self-development: online courses and library, experience exchange with colleagues around the world, partial grant of certification;
- Possibility to relocate for short and long-term projects;
- Relocation package for those who relocates to Gdansk from other locations; Experience exchange with colleagues all around the world;
- Fruits on a weekly basis;
- Competitive compensation depending on experience and skills;
- Regular asssments and salary reviews;
- Social package - medical care, sports;
- Corporate and social events.