Human Resource Analytics: Enhancing Organizational Effectiveness through HR Analytics

6 minute read

This HR Analytics project is designed to provide a holistic understanding of the organization by focusing on key areas.

Objectives

Objectives include analyzing current demographics and management structures, evaluating organizational performance, assessing employee satisfaction levels, and identifying and addressing counterproductive work behaviors. Through a combination of quantitative and qualitative methods, the project aims to deliver actionable insights to enhance overall organizational effectiveness and employee well-being.

Determine the current demographic and management structure of the organisation
Determine the level of organization performance
Determine the level of organization satisfaction
Determine the level of organization counterproductive work behaviour

Part One: Understanding the Background and Data

Content

The CSV revolves around a fictitious company and the core data set contains names, DOBs, age, gender, marital status, date of hire, reasons for termination, department, whether they are active or terminated, position title, pay rate, manager name, and performance score.

Recent additions to the data include:

Absences

Most Recent Performance Review Date

Employee Engagement Score

Acknowledgements

Dr. Carla Patalano provided the baseline idea for creating this synthetic data set, which has been used now by over 200 Human Resource Management students at the college. Students in the course learn data visualization techniques with Tableau Desktop and use this data set to complete a series of assignments.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("/kaggle/input/human-resources-data-set/HRDataset_v14.csv")

Information about the data and table example

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 311 entries, 0 to 310
Data columns (total 36 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 Employee_Name               311 non-null    object 
 EmpID                       311 non-null    int64  
 MarriedID                   311 non-null    int64  
 MaritalStatusID             311 non-null    int64  
 GenderID                    311 non-null    int64  
 EmpStatusID                 311 non-null    int64  
 DeptID                      311 non-null    int64  
 PerfScoreID                 311 non-null    int64  
 FromDiversityJobFairID      311 non-null    int64  
 Salary                      311 non-null    int64  
Termd                       311 non-null    int64  
PositionID                  311 non-null    int64  
Position                    311 non-null    object 
State                       311 non-null    object 
Zip                         311 non-null    int64  
DOB                         311 non-null    object 
Sex                         311 non-null    object 
MaritalDesc                 311 non-null    object 
CitizenDesc                 311 non-null    object 
HispanicLatino              311 non-null    object 
RaceDesc                    311 non-null    object 
DateofHire                  311 non-null    object 
DateofTermination           104 non-null    object 
TermReason                  311 non-null    object 
EmploymentStatus            311 non-null    object 
Department                  311 non-null    object 
ManagerName                 311 non-null    object 
ManagerID                   303 non-null    float64
RecruitmentSource           311 non-null    object 
PerformanceScore            311 non-null    object 
EngagementSurvey            311 non-null    float64
EmpSatisfaction             311 non-null    int64  
SpecialProjectsCount        311 non-null    int64  
LastPerformanceReview_Date  311 non-null    object 
DaysLateLast30              311 non-null    int64  
Absences                    311 non-null    int64  
dtypes: float64(2), int64(16), object(18)
memory usage: 87.6+ KB

df.head()

	Employee_Name	EmpID	MarriedID	MaritalStatusID	GenderID	EmpStatusID	DeptID	PerfScoreID	Salary	...	ManagerName	ManagerID	RecruitmentSource	PerformanceScore	EngagementSurvey	EmpSatisfaction	SpecialProjectsCount	LastPerformanceReview_Date	Absences
0	Adinolfi, Wilson K	10026	0	0	1	1	5	4	62506	...	Michael Albert	22.0	LinkedIn	Exceeds	4.60	5	0	1/17/2019	1
1	Ait Sidi, Karthikeyan	10084	1	1	1	5	3	3	104437	...	Simon Roup	4.0	Indeed	Fully Meets	4.96	3	6	2/24/2016	17
2	Akinkuolie, Sarah	10196	1	1	0	5	5	3	64955	...	Kissy Sullivan	20.0	LinkedIn	Fully Meets	3.02	3	0	5/15/2012	3
3	Alagbe,Trina	10088	1	1	0	1	5	3	64991	...	Elijiah Gray	16.0	Indeed	Fully Meets	4.84	5	0	1/3/2019	15
4	Anderson, Carol	10069	0	2	0	5	5	3	50825	...	Webster Butler	39.0	Google Search	Fully Meets	5.00	4	0	2/1/2016	2

5 rows × 36 columns

Part Two: Exploratory Data Analysis

Demographics

Sex distribution in the organisation

sns.catplot(x='Sex',kind='count',data=df)

<seaborn.axisgrid.FacetGrid at 0x7b6614918970>

png

Ethnicity distribution in the organisation

plt.figure(figsize=(14,10), dpi=200)
sns.catplot(x='RaceDesc',kind='count', data=df)
plt.xticks(rotation=90)

png

Location-Based Distribution

sns.histplot(data=df, x="State")
plt.xticks(rotation=90)

png

Organisation structure

Position Distribution Overview

pd.DataFrame(df["Position"].value_counts())

	count
Position
Production Technician I	137
Production Technician II	57
Area Sales Manager	27
Production Manager	14
Software Engineer	10
IT Support	8
Data Analyst	7
Sr. Network Engineer	5
Database Administrator	5
Network Engineer	5
BI Developer	4
Senior BI Developer	3
Administrative Assistant	3
Sales Manager	3
Accountant I	3
Sr. DBA	2
IT Manager - DB	2
Sr. Accountant	2
Director of Operations	1
Shared Services Manager	1
Data Analyst	1
Data Architect	1
Principal Data Architect	1
IT Manager - Infra	1
President & CEO	1
Enterprise Architect	1
BI Director	1
Director of Sales	1
IT Director	1
IT Manager - Support	1
Software Engineering Manager	1
CIO	1

How many managers organization have? How many subordinates does a manager have?

managersN = len(df["ManagerID"].unique())
employee_counts = df['ManagerID'].value_counts()
print(f"There is {managersN} managers in total. Every manager have to manage {round(employee_counts.mean(), 2)} employees in average.")

There is 24 managers in total. Every manager have to manage 13.17 employees in average.

How many managers are per department?

managers_per_department = df.groupby('Department')['ManagerID'].nunique()
managers_per_department

Department
Admin Offices            4
Executive Office         1
IT/IS                    6
Production              11
Sales                    4
Software Engineering     3
Name: ManagerID, dtype: int64

What is employees distribution across departments?

sns.catplot(x= 'Department',kind='count',data=df, hue="Sex")
plt.xticks(rotation=45)

png

Organisation Performance

What is the distribution of Performance Score in the organization?

color_palette = sns.color_palette("flare")
sns.set_palette(color_palette)
sns.catplot(x = "PerformanceScore",kind='count', data = df)
plt.xticks(rotation=45)

png

sns.catplot(x= 'PerformanceScore',kind='count',data=df, hue="Department")
plt.xticks(rotation=45)

png

Due to absolute count and unequal distributions of employees per depertment, performance score among departments can’t be compared. Therefore, relative proportion, taking into account the relative number of employee per depertment, is calculated for better comparison.

# Calculate the percentage of each category within each department
percentages = df.groupby(['Department', 'PerformanceScore']).size().div(df.groupby('Department').size()).mul(100).reset_index(name='Percentage')
# Create a pivot table for better organization
pivot_table = pd.pivot_table(percentages, values='Percentage', index='PerformanceScore', columns='Department')
pivot_table

Department	Admin Offices	Executive Office	IT/IS	Production	Sales	Software Engineering
PerformanceScore
Exceeds	NaN	NaN	12.0	12.918660	6.451613	18.181818
Fully Meets	100.0	100.0	84.0	76.076555	77.419355	72.727273
Needs Improvement	NaN	NaN	2.0	7.177033	3.225806	9.090909
PIP	NaN	NaN	2.0	3.827751	12.903226	NaN

How many special projects does each department have?

sns.catplot(x= 'SpecialProjectsCount',kind='count',data=df[(df["SpecialProjectsCount"]> 0)] , hue="Department")
plt.xticks(rotation=45)

png

Interpretation The data shows that the Sales Department has the highest relative proportion for more improvement (Needs Improvement and PIP). In general, the organisation seems to have a good performance review. Also, the interesting highlight is that Sales does not have any Special projects, and the Production Department has only a couple of them. This highlight might indicate the origin of a lack of satisfying evaluation due to motivation levels. This relationship should be further explored.

Organisation Satisfaction

What is employees satisfaction and salary across departments?

dep_sat = df.groupby('Department')['EmpSatisfaction'].mean().reset_index()
sns.barplot(x='Department',y='EmpSatisfaction',data=dep_sat)
plt.xticks(rotation=45)

png

dep_sal = df.groupby('Department')['Salary'].mean().reset_index()
sns.barplot(x='Department',y='Salary',data=dep_sal)
plt.xticks(rotation=45)

png

Interpretation It seems that there is no huge differences in employee satisfaction among departments. Although, Executive Office shows significant drop in satisfaction, in spite of the highest level of salary. Such insight indicates that origin of lower level of satisfaction in Executive Office can be attributed to other factors (for example, higher level of stress, etc.).

What is the level of employee satisfaction in respect to their performance score and salary?

per_sat = df.groupby('PerformanceScore')['EmpSatisfaction'].mean().reset_index()
sns.barplot(x='PerformanceScore',y='EmpSatisfaction',data=per_sat)
plt.xticks(rotation=45)

png

dep_sal = df.groupby('PerformanceScore')['Salary'].mean().reset_index()
sns.barplot(x='PerformanceScore',y='Salary',data=dep_sal)
plt.xticks(rotation=45)

png

Interpretation Evidently, the employees that had lowest performance score also had lowest satisfaction and salary. Such a finding requires a deeper analysis, to seek the casual relation - is low satisfaction and salary causes low performance score or vice versa.

Counterproductive work behavior

What is the degree of absenteeism across departments and performance score level?

dep_sal = df.groupby('Department')['Absences'].mean().reset_index()
sns.barplot(x='Department',y='Absences',data=dep_sal)
plt.xticks(rotation=45)

png

dep_sal = df.groupby('PerformanceScore')['Absences'].mean().reset_index()
sns.barplot(x='PerformanceScore',y='Absences',data=dep_sal)
plt.xticks(rotation=45)

png

What is the degree of tardiness across departments and performance score level?

dep_sal = df.groupby('Department')['DaysLateLast30'].mean().reset_index()
sns.barplot(x='Department',y='DaysLateLast30',data=dep_sal)
plt.xticks(rotation=45)

png

dep_sal = df.groupby('PerformanceScore')['DaysLateLast30'].mean().reset_index()
sns.barplot(x='PerformanceScore',y='DaysLateLast30',data=dep_sal)
plt.xticks(rotation=45)

png

Interpretation Data show that Sales and Production Departments have highest level of counterproductive working behaviour, as well group that have lowest performance score (both departments are highly represented in those groups).

Conclusion

The analysis of organizational data reveals key insights. The Sales Department shows a notable need for improvement, suggesting potential areas for management attention and motivation enhancement. Conversely, the Executive Office experiences a significant drop in satisfaction despite having the highest salary, indicating the influence of non-monetary factors on job satisfaction. The correlation between low performance scores, lower satisfaction, and salary levels prompts further investigation into causal relationships. Additionally, Sales and Production Departments exhibit higher levels of counterproductive behavior, particularly in groups with lower performance scores, emphasizing the need for targeted strategies to mitigate such behaviors. Overall, these findings provide a foundation for focused interventions to improve organizational performance, employee satisfaction, and well-being.

Share on

Twitter Facebook LinkedIn

Marko Antolčić