Senior Big Data Wrangler Resume
Seattle, WA
SUMMARY:
Innovative, analytical and solution - driven software developer with 10 years of experiences. In my current experience in Confidential as senior Big Data Engineer, I have been working on investigating petabytes of operational prod data sources and building useful data flow with upstream consolidations. Besides to that I have been monitoring, validating and analyzing of data from multiple stations on different time zone offsets. Receiving data from different stations through Iaas instances to process and transmit to Hadoop Cluster thru various data transmitting API’s map-reduce based MFIS . Prod jobs are coordinated by Oozie and/or crontab from/to different gateway servers to final land in Hadoop Cluster. Cleaning and ingesting data to managed hive tables for monitoring and validation via coordinated active queries. Besides to these, in my previous work experiences in different companies like Confidential and Confidential, I had chance to wrangle big data with Hadoop, Spark, Pig, Oozie, Hive, Flume, Sqoop, Hbase and SQL and NoSQL, AWS, Machine Learning, informatica Big queries and text mining for key business initiatives. Worked closely with peers, vendors and other IT companies to identify what operational services are required to enable exploitation of the Big Data platform. Engaged constructively in Git serviced source controls and agile (Scrum and/or Kanban CICD ) based team Confidential to support project objectives through production of sound architectural principles.
TECHNICAL SKILLS:
Java: Spark, Scala, Scripts (shell and or bash,sql,oozie, active quries), Hadoop MR1,MR2, Hive (Hue beewax, beeline; hive cli, beeline cli), NoSQL Hbase, Machine Learning, Data Visualization, Platfora, REST
DATABASES: SQL, MySQL, EC2, EMR, S3
TOOLS: Eclipse oxygen, R Studio, SPSS, Weka, NetBeans IDE,BlueJ, Notepad++, Genie
PLATFORMS: Linux, Cloudera (CDH 5.9), Hortonworks Data Platform, Ambari, Windows
DESIGN PATTERNS: Abstract Factory, Factory Method, Composite, Facade, Template Method, MVC, and Singleton
Software Dependency: Maven, Ant
Hadoop Testing Tools: MRunit, Mokito, HiveRunner, Beetest ,Hive test
There are no automation tools or frameworks available for Flume, sqoop and oozie unit testing yet. Tested manually.
Source controls: Git, SVN
Source control services: Github
Agile Confidential /Bug Tracking: agile board in Jira sprint and/or CICD,
Team communication: Slack
Continuous Integration: Jenkins, code reviewing
Code reviewing: git pull request
Static Code Analysis: FindBugs, CheckStyle
Logging: Log4J
Formats: RCFile,JSONF,AVRO,xSV
Compression: gzip,snappy,lz4
PROFESSIONAL EXPERIENCE:
Senior Big Data Wrangler
Confidential, Seattle, WA
Responsibilities:- Usage: hive quries, Pig Scripts, Sqoop and Map Reduce
- Built by: custom MFIS plug-in (map-reduce)
- Portioning: UTC Date and network
- Format: tsv, uses csv, RCFile,JSONF,AVRO,xSV
- Compression: gzip(MFIS partitioned blocks), LZ4
- Date and time formatting: use ISO 8601 and add integer unix epoch for faster numeric comparisons.
- Issues: small file required upstream consolidations.
- I have been working in different data formats by analyzing, extracting and flattening of complex data structures. Organizing columnar based on usage. Validating it every morning with frequency based oozie/crontab based job in production.
- Daily production alert monitoring through email, Slack posting, SNS Topics. Resolve alert accordingly.
Big Data Developer
Confidential, St, Fairfield, IA
Responsibilities:- I had been implementing this project as Big data engineer to extract valuable patterns from the given innovation research survey tools. The data contains various scholarly disciplines (Engineering, Physics, Biology...), tools (R, Weka, SPSS,...), Research innovations (published, unpublished, ongoing, no info), years, age, organization, place (field and laboratories) and etc. I had extracted knowledges like innovation tools and it’s applicability throughout years, in which discipline, in which country, which age, and making different complex analysis (Regression: Predictive Analysis,Correlation, Decision tree, Clustering, Association of the given dependent and independent variables)
- Worked on this Big Data collection using Mongodb, hive, pig, and HDFS to explore global research practices and opinions on scholarly communication tools in various disciplines.
- Data Visualization: Big Data Visualization Dashboard (charts, graphs) using R, Weka, Spss
Tools and Technologies: Java, Flume, Spark, Agile Scrum methodology to ensure project efficiency, MRUnit, REST API Services and JSON raw, tsv final data.
Big Data Developer
Confidential
Responsibilities:- Storing and processing petabyte of data in Hadoop cluster from different data centers under different categories with geo offsets: science, trade, education, social questions and etc .
- Worked on Social questions Data Center(SQDC) to build Social Questions models on the country-by-country basis. SQDC makes the data freely available as gzipped(Gzip, Bzip2 codecs) files of day-over-day updates. SQDC has immigration-related databases which contain legal, illegal and permanent resident big tables with a minimum of 297MM records.
- Using Hadoop MR to process, hive and pig to access, sqoop to integrate, oozie to monitor and extract insight from these big datasets.
- Sqooping data to Hadoop cluster and mange it via Hive cli scripts and coordinated with oozie.
- New arrival of data partition by geoid and datetime.
Confidential
Cluster Manager
Responsibilities:- Implemented Modern Data Pipeline Architecture (1. Data ingest from Data Source, to 2. Data Pipeline (HDFS,AWS,Hbase..) and processed with Spark and Hadoop, Accessed with hive, hbase(Operational Distributed Storage) and 3. Data Visualization Dashboard Predictive Model, SPSS, Weka, R... )
- Implementations: Big data volumes in 10 clusters at 6 different locations each with 1000 nodes with 9-12PB of data. Can manage on average 1 PT of data per HDFS APP Master.
- Resulted in efficient BI solution of social question model
- Design, Implement and support data ingestion routines in Hadoop and transform (summary, query and analysis ) ingested data using HIVE and load into target repositories (HIVE / HBASE )
Tools and Technologies: Java, Agile methodology to ensure project efficiency, MRUnit and Big data repository server with Git, and Hadoop Ecosystem modules: NoSQl Hbase,Mongodb, Oozie, Hive, Pig, Apache Spark with functional programming Scala.
Big Data Developer
Confidential
Responsibilities:- Worked on Big Data Analysis using Weka, SPSS and R studio to explore the data on how Torino travels by bike. Generated Dash Board analysis to save existing flow.
- I had been work as extracting data from source, transforming data by cleaning and extracting valuable knowledge and loading it. I had been extracting the frequency of customers (subscribed or unsubscribed), best seasons with corresponding routes, trip per day,per season, hours vs customers.
- During this project i was implemented the following modules
- Data Access Components: Pig and Hive
- Data Integration Components: Apache Flume, Sqoop
- Data Confidential and Monitoring Components: Cloudera, Oozie
- Representation of results: Decision tree, Regression tree, Charts (Pie, bar..)
Tools and Technologies: Rstudio, SPSS, Weka, Sqoop, Hive, Oozie, Agile Scrum methodology to ensure project efficiency.
Big Data Developer
Confidential
Responsibilities:- Tasked with refining and visualizing Hadoop cluster Log Data for Security (in situation of suspected security breach, how can server log data be used to identify and repair the vulnerability?) and Compliance (for system audits).
- Used Pig and hive, on top of Hadoop fs for processing. Implemented Hadoop Map Reduce Jobs with Java for cleaning and monitoring of metadata.
- Analyze Hadoop log files, support, maintenance and ongoing monitoring
Java Developer
Confidential
Responsibilities:- Developed efficient user interface for desktop application.
- Ensured reliable and secure data transactions by implementing Windows Communication Foundation (WCF) Framework which provides efficient services oriented application (SOA).
- Optimized system performance. The System helps employees manage and review their work performances (tasks) easily in a day to day basis. I performed secure IM with different endpoints by using services hosted by application.
Java Developer
Confidential
Responsibilities:- Developed automated process for workflow, based on client requirements.
- Integrated business model logic with client’s laboratory / task Confidential procedures. LIMS contains various modules such as Store Confidential, Sample and Task Tracking, Customer Relationship Confidential, Laboratories and Parameter Confidential, Role Delegation, Cost Confidential, Customer Complaint Handling, User Tasks Confidential , Test report and other relevant reports generation.
Technologies used: Java, JSF, EJB, Ajax, JavaScript, CSS, MySQL, Jasper report, Eclipse, GitHub, Scrum Agile.
Java Developer
Confidential
Responsibilities:- As a key contributor to software development, supported requirements Confidential , work-flow development, UI design, database development, programming, and unit testing. Prepared technical documentation. Contributed to ongoing process and systems enhancement. Worked collaboratively to resolve issues.
- Built and integrated Human Resources Confidential web application using Java, J2EE, JSP, Servlet, Core Java, XML, Spring MVC and Hibernate. Utilized JasperReports to print payrolls and other reports.
- Provided holistic view of employee’s performance that provided powerful tools for CEO and administrators, improving decision-making, productivity, and cost-efficiency.
- Implemented software that measurably reduced time spent on customer record and inventory while improving sales figures, inventory Confidential and productivity.
- Boosted customer sign-ups by designing system that offered effective registration incentives; introduced features that measurably increased sales.
- I Implemented VOIP using JMF for communication service of users at different level.
Java Developer
Confidential
Responsibilities:- For automation of library and to provide services for operators and Administrator. Responsibilities:
- Perform coding, testing, debugging
- Designed user interface for desktop application with JavaFX to manage books, check out records, and memberships.
- Frontend: Servlets, HTML and CSS, Java script. Back End: MS Access, Apache Tomcat server.
Tools and Technologies: Java, Eclipse, starUML, and JavaFX Scene Builder.
Java Developer
Confidential
Responsibilities:- Conducted analysis on project requirements and formulated plans based on client specifications. -Applied agile methodology to ensure project efficiency.
- Implemented Hibernate and Spring frameworks for back-end persistence and business logic development -Designed GUI with Java Server Faces (JSF) for building component-based user interfaces for web applications
- Performed CRUD operations. Tools and Technologies:
- Hibernate, Spring, Spring MVC, JMS, JSF, Servlets, J2EE APIs, Apache Tomcat Server
Java Developer
Confidential
Responsibilities:- Gained in-depth experience in development of Ground Control Station for mini UAV. The UAV sends a huge amount of data every microsecond so stored the stream of data in different effective file formats.
- Recorded this data in database by SQL Server. Implemented efficient SQL command to handle these.
- Processed data at log server (using Apache Flume) and SQL Server (using Sqoop) and also processed the data through MapReduce, Pig and Hive. The work achieved honors by way of an efficient BI. Main Responsibilities
- Analyzed requirements and performed design implementations.
- Implemented business requirements
- Reviewed code and conducted performance testing.
- Contributed to documentation and logic flow charts.
Tools and Technologies: Spring & Hibernate, Netbeans IDE, Eclipse, HDFS Junit test cases for all modules to ensure complete code coverage.
Java Developer
Confidential
Responsibilities:
- Played active project development role and contributed a lot in changing existing analog surveillance system into Ip based video surveillance systems.
- Imported and processed byte based chunk of streaming data from different communication devices: Sony and Axis IP Cameras, Controlling Antenna, RC and different sensors. Built, imported data from these digital devices, and integrated third party libraries for the online streaming and controlling systems in website.
- I used angular for automatic synchronization of data between model and view components.
- So I implemented Angular's data binding and dependency injection techniques.
- The helps to control system at various level including mobile devices. I implemented this project in clean MVC way by customizing client side application using JavaScript.