Sunday, August 7, 2016

BigData: Hadoop and MapReduce

This post will give general idea about Big Data and Hadoop Architecture with MapReduce.

Data Sources
According to IBM: "Every day, 2.5 billion gigabytes of high-velocity data are created in a variety of forms, such as social media posts, information gathered in sensors and medical devices, videos and transaction records"

Definition of Big Data 
Big Data is a loosely defined term used to describe data sets so large and complex that they become awkward to work with using standard statistical software.(International Journal of Internet Science, 2012, 7 (1), 1–5) 

The 3 Vs - Volume, Variety, Velocity
Volume - Huge volume of data needs to be stored. For this we need cheap and reliable storage solutions.
Variety - Is storing all type of data and in its raw format.
Velocity - Refers to ability to process the data as it arrives which can be very fast in case of huge amount of data.


The 3 V's were first defined in a research report by Douglas Laney in 2001 titled "3D Data Management: Controlling Data Volume, Velocity and Variety" .
In 2012 he updated the definition as follows "Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization".

Doug Cutting, Creator of Hadoop
He used papers published by Google about distributed file system (GFS), and processing framework MapReduce, to create open source version of Google system. Then after joining Yahoo, Hadoop project was created as which was scalable and stable version of their older system.

Here are the papers Google published about their distributed file system (GFS) and their processing framework, MapReduce.
 
"Hadoop" is name of elephant toy of Cutting's son.

Core Hadoop
Consist of distributed network of computer cluster with HDFS (Hadoop Distributed File System) process it with MapReduce.

Hadoop Ecosystem 

HDFS is used to store data
MR(MapReduce) is used to process data in HDFS. But to use MR we need programming language like Java, Python, etc.

Alternative is to use,
Hive, whcih converts normal SQL query to MR. These are used to run batch process.
Pig, converts simple script commands to MR

But these may be time consuming if we have large amount of data.

To overcome these, another project was developed
Impala, takes commands as SQL query and directly runs over HDFS and process data. It is very optimised and much faster than Hive or MP.
HBase, is real time DB build on top of HDFS

Sqoop, takes data from SQL and puts it in HDFS for processing with other data.
Flume, injust data as it is generated by external system and puts in cluster.

Hue, is graphical front end to cluster.
Oozie is work flow management tool
Mahout is machine learning library

All these comes as part of CDH (Cloudera Distribution of Hadoop including apache Hadoop). CDH is Free and Open Source.

Understanding HDFS
 
In HDFS, files are divided into blocks of 64mb. Each block is names blk_
Each block is stored on seperate node in cluster called data node. Each cluser have a name node which keeps track of which block is part of which data. This information in name node is meta data.


64mb block size helps as compared to 16kb by other file system
1. easy management by name server, else there will be too many block to track
2. Mapper is needed to each block we want to process, there would be a lot of mapper, each processing a piece of data which isin't efficient.

To guard against N/w failure and Disk failure each block is stored on 3 different data node. This is data redudency. If one of the node failes, name node will detect that some blocks are not replicated 3 times and so it will start making new copies of failed blocks so that each block is replicated 3 times in a cluster.

To protect Name Node failure, it is also backed up with Standby name node.Standby name node is activated when primary is not responding. Name nodes can also be backed up   using NFS (Natwrok Filed System) on remote disk.
Hadoop Commands
Hadoop commands start with "hadoop fs". Hadoop commands are similar to UNIX. 
eg.- hadoop fs - ls :-List
       hadoop fs - put :- put file to particular folder

Map Reduce
MapReduce breaks the complete data into small chunks and process it in parallel. mappers distribute and work in parallel and reducers will sort and reduce data.
Hadoop takes care of the Shuffle and Sort phase. You do not have to sort the keys in your reducer code, you get them in already sorted order.

 
Job Trackers are daemons of MR to keep track of all pending Jobs in a cluster. And each data node has its own Task Trackers to track task assigned to it. This is make it simple to process the data in individual data node in the cluster. Processing the data by mappers locally speeds up the process. Once the mapper have finished, data is sorted and passed to reducers, which may be running on different node in same clusters.

Resource Web Link:
Udacity Course: https://www.udacity.com/course/intro-to-hadoop-and-mapreduce--ud617
Cloudera (CDH): https://www.cloudera.com/products/apache-hadoop/key-cdh-components.html
       

Sunday, July 13, 2014

Fwd: Pet Assessment

Hello,

Please help me in completing this assessment.

Also, Fwd this to your friends as well.

Regards,

Amit Agrahari

----------------------------------------------------------------------------------------------------------

If you have trouble viewing or submitting this form, you can fill it out online:
https://docs.google.com/forms/d/1-WuhfJ_2LK21JAVqKO56_z2ThVCF2kdrLRXtYoX4554/viewform?c=0&w=1&usp=mail_form_link

Pet Assessment

Please complete this survey for a project.
It will take less than 2 minutes to complete this survey.
* Required
Powered by
Google Forms
This content is neither created nor endorsed by Google.
Report Abuse - Terms of Service - Additional Terms


Wednesday, June 5, 2013

Project Execution Skills

Project Management Skill is a single most important aspect of one's life which can make him/her a successful person. Even for organizations and country Basic Project Management is very important.

Project Management Today:
Today when people do project management - "The plan may look good, but practically  not possible."
In 2007, 38% of the project executed world wide was successful.
Last year, in 2012, this number came down to 32%. (The reason was economic recession which was itself caused due to poor project planning.)
Companies like Boeing and Dream-liner which was at pinnacle of project execution and planning have failed due to the urge to beating the competition.
Nevertheless, some Banks in Africa have done exceptionally well during the same recession time. They took advantage of cheaper labor, land etc. to expand themselves.
The crux of the introduction above is Project Planning is most important aspect of any project, it can be even applied to individuals life(Will discuss about it in detail, in next section).

Success = Meeting desired result:
How to define success?
 Success is meeting a desired result(as expected or planned).

Work:
When we think of work, then there are only two kinds of work in this universe:
1. Projects
2. Operations



Projects: Every project have 3 characteristics:
  • Temporary (It should be time bound)
  • Unique (It should be unique)
  • Progressive Elaboration (There has to be a continuous learning curve
Operations: In contrast, operations start only after a project gets complete. And a final Hand-over is done from project team to operation team. Hand-over is the most difficult part, as the two teams function in totally different manner. There is a barrier of language(not literal language like English/Hindi) but the understanding of the same terms. Project fails due to lack of common language/ definition used by everyone. E.g.: 'Quality' can have different meaning for everyone.

Life as an Example:
Lets evaluate life, whether it is a Project or Operation.
1. Well everyone has limited time span. (Temporary)
2. Every individual have Unique life experiences and unique in every aspect. (Unique)
3. And as we grow, we continue to learn. (Progressive Elaboration)

Bingo, our life is nothing but a Project. Do we really plan our life? Most of time 'NO', then it is the time to reflect on this. Most successful people in world, have done one thing in common. They have planned well, and also executed it well. Also, half of your task is accomplished if you plan well.

Planning:
If we plan well before execution of any project or task, we have taken a big step in success of that project /task.
Example: A 4-star hotel was build in just 15 days in China (Google to know more).
                Even in India, a 10-story building was build just in 48 hours (Google to know more).
This was only possible due to amount of planning that went into these project.

Some important point:
1. Planning is not optional(Though we think this to be optional).
2. Too much planning is paralysis.
3. Planning advocates, do it in structured way.
4. 'JUGAD' - It is not solution, but a patch up, which we do to coverup bad planning.
5. "Biting the Silver Bullet" - Once someone has committed it, without much thinking, we are bound to live up to the commitment.

Some Common terms and its definitions, which is commonly used:

The Ice - Berg Effect:
In management, there is a term used as The Ice Berg Effect. If you observe an ice-berg, as below, you'll notice that Only 10% of Ice-berg is visible to us rest 90% is below the water. This is significant as we look at a problem in a similar fashion. We give solutions based on the 10% of problem, without even Thinking at 90% of the problem. (Google to read more.)

 10% is visible problem.


Real problem is 90%
1. How project management works?
2. Lack of communication and inter-personal skills.

It is seen that lack of communication is the biggest challenge in an organization and in personal life.

Communication challenge:
The communication is a challenge in an organization as it has got hierarchy (horizontal divide) and teams/departments (vertical divide). These division create a gap which is responsible for the communication gap. As shown below:
Another important aspect of communication is that you may understand a thing completely, but you may not make others to understand it completely. So you should follow a principle - "I'm responsible for what I say and also responsible for what You understand." So, it is very important for to understand, if others have fully understand you.

Communication skills also become important because we are dealing with human beings and everyone has their own way to communicate and understand. So we have to change our style of communication as per the need.

Stakeholder:
"An individual or body who positively or negatively affect your project."
E.g.: Positive - Project manager(w.r.t. Project), Friends (w.r.t. Life)
         Negative - Environment Ministry (For a new installation of industry)

For successful management, the stakeholders have to be satisfied. And as a rule you cannot satisfy everyone. So, you need to priorities every stakeholder, whom to satisfy and whom to reject. More about this when we actually discuss about the project execution stages.

Pareto Principle: 80/20 Principle
States that - "If you solve 20% of your problem, 80% of solution is met".
Identifying 20% of top most problem and solving it, will result in 80% of total expected solution.

This is even true in case of efforts, 20% of most important efforts, will give you 80% of expected results. In India, it is very well known in Marwadi community and the saying goes like this: "Hing lage na fitkari, aur kaam bhi chokha". Means you can achieve your target with minimal of efforts, in fact you should always try to achieve your goals with minimum efforts. (Google for more info)


Sponsor:
Someone who makes it possible for you to execute a project. Individual/body who arranges for money, resource etc. are sponsor for you.

Project Life-cycle:
Ideally a project life cycle should look as below. The cost and resource should be more only during intermediate execution phase (carrying out the work).

But in Real world scenario, it looks something like below. Due to improper planning, we end up increasing the cost of the product(indicated by Red line) or the time of deliverable.

Thus, proper project planning is very important. It not only saves cost but also the actual execution time.

Organization Structure:
The Organization structure of any organization is purely based on the type of business it performs. Some one is into retail chain (E.g.: Big Bazar) then they will have purely operational style of organization structure. Organization like DRDO/ISRO are purley into project execution, so similar organization structure.

So, based on two type of business we can have 5 types of organization structure:
Functional: It is purely into Operation business. E.g. Big Bazar
Weak matrix: It is in operation, with occasional project execution. E.g. Toyota
Balance Matrix: 50% business in Operations and 50% in Projects. E.g. Nissan/ Renault
Strong Matrix: More in project execution and less in operations.
Projecitized: Purely in Project execution business. E.g. NASA, ISRO

Now, it is very important to understand that changing an organization structure can create a havoc in an organization because people are resistant to change. This is a very reason why 98% of the mergers fails, as two organization will have totally different structure. And people are not willing to adopt others way of working and their organization structure.
 


So, now we are done with concepts, lets discuss about 5 stages of project execution:

5 Stages of Project Execution:
Project execution have 5 stages, shown in the diagram below:

Even before the start of the Initiation phase, we do a Feasibility Study & Benefit-to-Cost Analysis. On completion of these, we go ahead and initiate the project.

1. Initiation phase:
Initiation of a project starts with
a. Market input &
b. Assignment of project manager but the most important factors are defining
c. Triple Constraints (Time, Cost and Scope)
d. Objective
e. Justification
f. Stakeholders - Identification and Prioritization

Triple Constraints : It is constraint of three factors Time, Cost and Scope. If we want to extend the scope, it will lead to extension of time or cost or both and in case we have constraint of cost, then we can only extent the time factor with almost same cost.
In order to access the triple constraint, we use following matrix. Only condition is, each column can only have one Check mark, this helps us to clearly identify the constraints in our project.

Objective: Our objective of the project should be very clear, we should follow SMART guidelines. The following figure is self explanatory. Always have SMART objective in life.

Justification: Always give a justification of the project. It makes it clear on what should be approach of the project planning and execution. E.g. You may choose to do a business with large organization at no profit, but the justification can be - doing business with big organization will  bring more market acceptance and new business opportunity and more profits. So always a justification is needed.

Stakeholder: This is the most important aspect. It involves identification and prioritization of stakeholders. Always identify all the stakeholders, the more number of stakeholder the more is its management. Always, do this is in groups, the more mind working will result in more inputs. Once all the stakeholders are identified, they have to be prioritized based on the Power they have to influence the project and Interest in completion of project.

Key Players (High Power- High Interest) - Manage them closely
High power - Low Interest - Keep satisfied (Diplomatically)
Low Power - High Interest - Keep informe
Low power - Low interest - Minimal effort/ lightly monitor.
 
Example: Take an example of a typical Indian marriage event. We have so many people to satisfy and look upon. In this scenario we can look into who have high power and high interest we try to always keep them satisfied and informed (like bride and groom). In similar way we can identify all types of stake holders like father/mother/brother etc. of groom/bride and their power and interest. Then keep them satisfied or informed accordingly.
Similar way, we can have stakeholders from our social life and decide what priority you want to give to them, in your life. This way you'll be very clear in handling those people.


2. Planning phase:
The second phase in project execution is the Planning execution. It consist of following:
a. Scope
b. Requirements
c. WBS (Work Breakdown Structure)
d. Schedule (Duration/ Resource/ Dependencies)
e. Risk assesment
f. Communication Plan

Lets discuss each in details:

Need, Scope, Requirements & Specification:
Need - It comes from outside the team organization and based on this a project is initiated.
Scope - Is some part of need which we are planning to tackle. It is very important to clearly define, what is in INCLUDED in scope and which part is EXCLUDED from it.
Requirements - Derived from Scope and defined the business logic.
Specification - Technical Business Logic based on requirements
Work-Breakdown Structure (WBS):
WBS is a single most important tool to be used in Project Planning and also it can be used in any type of planning in life.
It is based on single principle - "IT HAS NO ACTIVITY" but it is used to find all possible activity.
Breakdown each of the project as deliverable, and go on sub-dividing it till you come to an logical end where, dividing it further will make it an activity. This lowest deliverable is known as "WORKPACKAGE".
Once Workpackage is defined so we can break it down to individual activity, for delivering that Workpackage.
Hence, whenever a change request comes we can look into this workpackge and define which activity can change and what can be its impact. (Google to read more.)

Problem faced when we start using WBS:
1. We are used to think in activity.
2. We immediately start thinking who will do what activity (Resource allocation).
3. We think is sequence of activity.

With little practice we can start thinking in deliverable(noun) and not activities(verb).
E.g.: A company dealing with machine installation.
For any order-to-installation, they had 68 checklist and it took 180days to install. After using WBS the check list was 265 and average installation time came down to 52days.
Not only that they now have template for WBS and is being used in each and every project they execute.
So it is advisable to stick to WBS and use it reluctantly.

(WBS Chart pro- A plugin for MS-Project).

WBS will help you to sequence the activities.

 Once WBS is in place, you need to sequence and schedule the task. Once it is properly sequenced and scheduled, you can find the most critical path which can directly affect the project output.

Schedule:
A task will have only two property: A Start date and an End date. Based on this, Relation between two task can be as follows:
1. Finish-to-start(FS) : Upon finish of one task other can start.
2. Start-to-finish(SF) : A task can start upon, finish of another task.(Academic in nature, not practical)
3. Start-to-start(SS) : Two task can start parallel, being inter-dependent
4. Finish-to-finish(FF) : Finish of one task will coincide with finish of another work.

Also, there can be a Lag (waiting time. E.g.: Drying of new wall, before painting) or Lead (Start before. E.g.: Procurement of raw materiel before start of production ) associated between two task.


Note: In MS-Project use 1FS to signify S.N.1 is associated with 2 as FS relation.
          To include holidays in WBS use, 10ed- Where- 'e' stands for Disregard any holidays.

Network Diagram & Critical path: Making a network diagram will help to identify time required to complete each activity and also the most critical activity. As a project manager you need to only look at most critical activity, and should not care much about other activity till it becomes critical. The critical activity will decide total duration of the project. Based on this you can:
1. Prioritize the resource
2. Re-allocate the resource

The activity which are not in critical path are called Float/Slack and it can be delayed by the time difference of its completion and project completion. A resource in float/slack  activity can be used in other activity, thereby increasing the productivity of the team/project.
Following diagram show the example of network diagram and critical path:

Note: Fortunately MS-Project provide the tool to draw N/w path and identify critical path, after we have proper WBS in place.

E.g. Nokia E71 launch. Product advertised but not available in market. This is due to lack of scheduling .

Risk: is an uncertain event. It can be negative (Threat) or positive (Opportunity). And depending upon which side you are, it can be a threat or opportunity. E.g. Earthquake can be a threat to people but an opportunity to Builders.
Risk management is directly proportional to reduction of cost.

There are four ways to manage the Risk (Threats/Opportunities):
Threat: 1. Mitigate - Reduce probability or Impact
             2. Accept - Cannot do anything about it
             3. Transfer - Outsource / get insurance (But you are still sccountable)
             4. Avoid - Change scope/ Avoid entire project/ Transfer or Outsource
Opportunity: 1. Accept
                      2. Enhance - Increase risk to get advantage
                      3. Exploit
                      4. Shared - Joint venture


Example:
1. Jack Welch, GE, wrote a book "Wining", proposed two principle:
        a. Always make mistake
        b. Never repeat a mistake

2. Richard Branson, Virgin Atlantic
3. James Cameron, Hollywood director (They took unconventional risk. Google for more info.)
4. Dr. Kiran Mazumdar-Shaw - Pharmaceutical industry

 Risk Taking Capacity:
As project progresses, cost due to risk increases.


Risk Management Process:  
1. Risk Identification
2. Risk Qualification (80/20 principle)
3. Risk Response

Risk Priority/ Assessment (FMEA):
Based on a. Probability  
                b. Impact

Quality:
Quality = Conformity to requirement + Fit to use
If, not meeting requirement = Rejection of product
    not fit to use = get it fixed



Example: In Japan, per person productivity or per person GDP is very high, due to high quality standards they maintain.


Example: For one army project, a team used retired army personnel to just write a proposal in their language, and it got selected. Just due to familiarity of the language. 

Sunday, August 19, 2012

My Camera Pics !!!

Happy Photographer's Day to all !!!




Wednesday, August 15, 2012

My Pics !!!

Here are some of my Pics which I have taken with my new camera:

Rate it:


Rate it:

Rate it:

Rate it:

Wednesday, March 17, 2010

Stop a nuclear disaster

Hi ,
Our government is churning out one hazardous bill after another. This time it is a bill called the Civil Liability for Nuclear Damage, and it's coming up for a vote in a couple of days.

The bill lets U.S. corporations off the hook for any nuclear accidents they cause on Indian soil. They'd only have to pay a meagre amount, and Indian taxpayers would be stuck paying crores for the nuclear clean up and to compensate the victims.

Without any public debate, the Prime Minister is appeasing American interests and ignoring our safety.

Greenpeace is launching a petition asking the PM to hold a public consultation before introducing the bill.

I have already signed this petition. Can you join me?

http://www.greenpeace.org/india/stop-the-vote2

Rate it:


Thanks!

agrahari007@gmail.com
You are receiving this email because someone you know sent it to you from the Greenpeace site. Greenpeace retains no information about individuals contacted through its site, and will not send you further messages without your consent -- although your friends could, of course, send you another message.
 

Tuesday, February 9, 2010

Mind Map

A mind map is a diagram used to represent words, ideas, tasks, or other items linked to and arranged around a central key word or idea. Mind maps are used to generate, visualize, structure, and classify ideas, and as an aid in study, organization, problem solving, decision making, and writing.

To make the concept more clear I've made a Mind Map of the MIND-MAP concept, given below:





You can get more information on Wiki: http://en.wikipedia.org/wiki/Mind_map