Data Engineer, Platform at SmartNews (Shibuya-ku, Japan)

Description: 

Job Description / General Message


The data engineering in SmartNews plays a key role in accelerating the products/business developments. Great efforts are paid to building a highly efficient and flexible data service for analytical and operational purposes. To serve the internal users from analytics and product-dev teams, the goal and mission of data engineers are to create high-level, easy-to-use data services for simplifying the accessing, integration and consolidation of various data sets, and also building the platforms for executing tasks processing massive data in terms of TB per day.


In order to achieve the optimal cost-performance, we are always looking for better solutions to fulfill the requirements of users and guarantee SLA/SLO. In SmartNews, we eagerly adopting advanced technologies from the software engineering, database and especially, the big-data open-source community.


/



TB


SLA/SLO


Responsibilities



  • Engineer, build and implement new services, libraries, tools, frameworks for data processing or management, and investigate new algorithms to increase efficiency for Data Processing, such as ETL, Data Pipelines, OLAP DBMS, real-time messages and streams processing, data-sync between systems, etc.

  • Do performance evaluation, monitoring and tuning of the data processing procedures or platforms, get insights of efficiency and stabilizability and make continuous improvement, such as optimizing distributed query engines, computing resource management and isolation, multi-tier storage systems, etc.

  • Own and maintain the key data processing portfolios such as APIs for accessing data, ETLs for processing data, storages/DBMS for hosting data and underlying hardware and software architectures, to build durable and scalable data platform as services, such as building the metadata management system or data lake, playing with data flow or workflow frameworks, etc.

  • Work closely with data architecting/modeling roles to understand ways to implement the data service, and interact with Site Reliability Engineering (SRE) team to deploy the environments and drive production excellence.

  • Diagnose and resolve complex technical challenges for data accessing or processing. Using elegant and systematic rather than ad-hoc methods to help other teams tuning the performance and improving stability. 




  • ETLOLAP DBMS


  • APIETL/DBMS

  • SRE



Qualifications



  • BS/MS degree in Computer Science, Software Engineering or equivalent practical experience

  • Strong Programming skills and experiences with a deep understanding of data structures and algorithms are required for building efficient and stable solutions

  • Understand the basic concepts of parallel, distributed programming and data processing such as MapReduce

  • Need certain knowledge on shell scripts and operating systems, especially on Linux

  • Deep understanding of modern big data technologies and ecosystems

  • Familiar with modern data stores either RDBMS or NoSQL stores (such as HBase, DynamoDB/Cassandra or Druid, etc); have experienced on developing application or function-extensions of such data stores

  • Be able to implement and tune complicated heavy-lifting data flows (ETLs or pipelines), familiar with certain toolings

  • A capability of system design with good modularity and extensibility

  • Be able to draft the user-understandable blueprint and precise, detailed designs

  • Familiar with system/module design methods and toolings such as unfamiliar with Hadoop, Spark, Hive, Presto, Storm or Flink, be able to develop data processing programs with them in batch or streaming mannerRich experiences with one or more programming languages such as Java, Scala, C++ or Python; familiar with agile development and manage testing skills




  • BSMS

  • JavaScalaC++Python1

  • MapReduce

  • Linux

  • HadoopSparkHivePrestoStormFlink

  • RDBMSNoSQLHbaseDynamoDB/CassandraDruid

  • ETL


  • /UML



Preferred Qualifications



  • Have experience on design and development of large-scale data processing systems, especially massive logs processing and OLAP DBMS.

  • Good knowledge of data integration and data warehouse design and development, such as integration patterns and high-level design or detailed logical/physical designs.

  • Good knowledge and experiences on DBMS and query optimization techniques, understand the indexing, query-optimizing, parallel executing, transaction/consensus management or storage engines.

  • Have experiences working with data scientists/analysts and understanding the pain point of the data analysis.

  • Have contributions on open source community especially large user-group projects.




  • OLAP DBMS

  • /

  • DBMS/




Benefits and Perks



  • Voluntary Trip - Working remotely twice a year

  • SmartKitchen - Healthy lunch on a daily basis for free

  • ChikyuCoffee - Delicious coffee provided by our Barista every day

  • Event space - Free use for any kind of meet up

  • Foreign language development support

  • Various social insurance benefits included

  • Transportation coverage (Maximum 50,000 yen)




  • 11

  • SmartKitchen -

  • -




  • (5)