6 minute read

This HR Analytics project is designed to provide a holistic understanding of the organization by focusing on key areas.

Objectives

Objectives include analyzing current demographics and management structures, evaluating organizational performance, assessing employee satisfaction levels, and identifying and addressing counterproductive work behaviors. Through a combination of quantitative and qualitative methods, the project aims to deliver actionable insights to enhance overall organizational effectiveness and employee well-being.

  1. Determine the current demographic and management structure of the organisation
  2. Determine the level of organization performance
  3. Determine the level of organization satisfaction
  4. Determine the level of organization counterproductive work behaviour

Part One: Understanding the Background and Data

Content

The CSV revolves around a fictitious company and the core data set contains names, DOBs, age, gender, marital status, date of hire, reasons for termination, department, whether they are active or terminated, position title, pay rate, manager name, and performance score.

Recent additions to the data include:

Absences

Most Recent Performance Review Date

Employee Engagement Score

Acknowledgements

Dr. Carla Patalano provided the baseline idea for creating this synthetic data set, which has been used now by over 200 Human Resource Management students at the college. Students in the course learn data visualization techniques with Tableau Desktop and use this data set to complete a series of assignments.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("/kaggle/input/human-resources-data-set/HRDataset_v14.csv")

Information about the data and table example

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 311 entries, 0 to 310
Data columns (total 36 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Employee_Name               311 non-null    object 
 1   EmpID                       311 non-null    int64  
 2   MarriedID                   311 non-null    int64  
 3   MaritalStatusID             311 non-null    int64  
 4   GenderID                    311 non-null    int64  
 5   EmpStatusID                 311 non-null    int64  
 6   DeptID                      311 non-null    int64  
 7   PerfScoreID                 311 non-null    int64  
 8   FromDiversityJobFairID      311 non-null    int64  
 9   Salary                      311 non-null    int64  
 10  Termd                       311 non-null    int64  
 11  PositionID                  311 non-null    int64  
 12  Position                    311 non-null    object 
 13  State                       311 non-null    object 
 14  Zip                         311 non-null    int64  
 15  DOB                         311 non-null    object 
 16  Sex                         311 non-null    object 
 17  MaritalDesc                 311 non-null    object 
 18  CitizenDesc                 311 non-null    object 
 19  HispanicLatino              311 non-null    object 
 20  RaceDesc                    311 non-null    object 
 21  DateofHire                  311 non-null    object 
 22  DateofTermination           104 non-null    object 
 23  TermReason                  311 non-null    object 
 24  EmploymentStatus            311 non-null    object 
 25  Department                  311 non-null    object 
 26  ManagerName                 311 non-null    object 
 27  ManagerID                   303 non-null    float64
 28  RecruitmentSource           311 non-null    object 
 29  PerformanceScore            311 non-null    object 
 30  EngagementSurvey            311 non-null    float64
 31  EmpSatisfaction             311 non-null    int64  
 32  SpecialProjectsCount        311 non-null    int64  
 33  LastPerformanceReview_Date  311 non-null    object 
 34  DaysLateLast30              311 non-null    int64  
 35  Absences                    311 non-null    int64  
dtypes: float64(2), int64(16), object(18)
memory usage: 87.6+ KB
df.head()
Employee_Name EmpID MarriedID MaritalStatusID GenderID EmpStatusID DeptID PerfScoreID FromDiversityJobFairID Salary ... ManagerName ManagerID RecruitmentSource PerformanceScore EngagementSurvey EmpSatisfaction SpecialProjectsCount LastPerformanceReview_Date DaysLateLast30 Absences
0 Adinolfi, Wilson K 10026 0 0 1 1 5 4 0 62506 ... Michael Albert 22.0 LinkedIn Exceeds 4.60 5 0 1/17/2019 0 1
1 Ait Sidi, Karthikeyan 10084 1 1 1 5 3 3 0 104437 ... Simon Roup 4.0 Indeed Fully Meets 4.96 3 6 2/24/2016 0 17
2 Akinkuolie, Sarah 10196 1 1 0 5 5 3 0 64955 ... Kissy Sullivan 20.0 LinkedIn Fully Meets 3.02 3 0 5/15/2012 0 3
3 Alagbe,Trina 10088 1 1 0 1 5 3 0 64991 ... Elijiah Gray 16.0 Indeed Fully Meets 4.84 5 0 1/3/2019 0 15
4 Anderson, Carol 10069 0 2 0 5 5 3 0 50825 ... Webster Butler 39.0 Google Search Fully Meets 5.00 4 0 2/1/2016 0 2

5 rows × 36 columns

Part Two: Exploratory Data Analysis

Demographics

Sex distribution in the organisation

sns.catplot(x='Sex',kind='count',data=df)
<seaborn.axisgrid.FacetGrid at 0x7b6614918970>

png

Ethnicity distribution in the organisation

plt.figure(figsize=(14,10), dpi=200)
sns.catplot(x='RaceDesc',kind='count', data=df)
plt.xticks(rotation=90)

png

Location-Based Distribution

sns.histplot(data=df, x="State")
plt.xticks(rotation=90)

png

Organisation structure

Position Distribution Overview

pd.DataFrame(df["Position"].value_counts())
count
Position
Production Technician I 137
Production Technician II 57
Area Sales Manager 27
Production Manager 14
Software Engineer 10
IT Support 8
Data Analyst 7
Sr. Network Engineer 5
Database Administrator 5
Network Engineer 5
BI Developer 4
Senior BI Developer 3
Administrative Assistant 3
Sales Manager 3
Accountant I 3
Sr. DBA 2
IT Manager - DB 2
Sr. Accountant 2
Director of Operations 1
Shared Services Manager 1
Data Analyst 1
Data Architect 1
Principal Data Architect 1
IT Manager - Infra 1
President & CEO 1
Enterprise Architect 1
BI Director 1
Director of Sales 1
IT Director 1
IT Manager - Support 1
Software Engineering Manager 1
CIO 1

How many managers organization have? How many subordinates does a manager have?

managersN = len(df["ManagerID"].unique())
employee_counts = df['ManagerID'].value_counts()
print(f"There is {managersN} managers in total. Every manager have to manage {round(employee_counts.mean(), 2)} employees in average.")

There is 24 managers in total. Every manager have to manage 13.17 employees in average.

How many managers are per department?

managers_per_department = df.groupby('Department')['ManagerID'].nunique()
managers_per_department
Department
Admin Offices            4
Executive Office         1
IT/IS                    6
Production              11
Sales                    4
Software Engineering     3
Name: ManagerID, dtype: int64

What is employees distribution across departments?

sns.catplot(x= 'Department',kind='count',data=df, hue="Sex")
plt.xticks(rotation=45)

png

Organisation Performance

What is the distribution of Performance Score in the organization?

color_palette = sns.color_palette("flare")
sns.set_palette(color_palette)
sns.catplot(x = "PerformanceScore",kind='count', data = df)
plt.xticks(rotation=45)

png

sns.catplot(x= 'PerformanceScore',kind='count',data=df, hue="Department")
plt.xticks(rotation=45)

png

Due to absolute count and unequal distributions of employees per depertment, performance score among departments can’t be compared. Therefore, relative proportion, taking into account the relative number of employee per depertment, is calculated for better comparison.

# Calculate the percentage of each category within each department
percentages = df.groupby(['Department', 'PerformanceScore']).size().div(df.groupby('Department').size()).mul(100).reset_index(name='Percentage')
# Create a pivot table for better organization
pivot_table = pd.pivot_table(percentages, values='Percentage', index='PerformanceScore', columns='Department')
pivot_table

Department Admin Offices Executive Office IT/IS Production Sales Software Engineering
PerformanceScore
Exceeds NaN NaN 12.0 12.918660 6.451613 18.181818
Fully Meets 100.0 100.0 84.0 76.076555 77.419355 72.727273
Needs Improvement NaN NaN 2.0 7.177033 3.225806 9.090909
PIP NaN NaN 2.0 3.827751 12.903226 NaN

How many special projects does each department have?

sns.catplot(x= 'SpecialProjectsCount',kind='count',data=df[(df["SpecialProjectsCount"]> 0)] , hue="Department")
plt.xticks(rotation=45)

png

Interpretation The data shows that the Sales Department has the highest relative proportion for more improvement (Needs Improvement and PIP). In general, the organisation seems to have a good performance review. Also, the interesting highlight is that Sales does not have any Special projects, and the Production Department has only a couple of them. This highlight might indicate the origin of a lack of satisfying evaluation due to motivation levels. This relationship should be further explored.

Organisation Satisfaction

What is employees satisfaction and salary across departments?

dep_sat = df.groupby('Department')['EmpSatisfaction'].mean().reset_index()
sns.barplot(x='Department',y='EmpSatisfaction',data=dep_sat)
plt.xticks(rotation=45)

png

dep_sal = df.groupby('Department')['Salary'].mean().reset_index()
sns.barplot(x='Department',y='Salary',data=dep_sal)
plt.xticks(rotation=45)

png

Interpretation It seems that there is no huge differences in employee satisfaction among departments. Although, Executive Office shows significant drop in satisfaction, in spite of the highest level of salary. Such insight indicates that origin of lower level of satisfaction in Executive Office can be attributed to other factors (for example, higher level of stress, etc.).

What is the level of employee satisfaction in respect to their performance score and salary?

per_sat = df.groupby('PerformanceScore')['EmpSatisfaction'].mean().reset_index()
sns.barplot(x='PerformanceScore',y='EmpSatisfaction',data=per_sat)
plt.xticks(rotation=45)

png

dep_sal = df.groupby('PerformanceScore')['Salary'].mean().reset_index()
sns.barplot(x='PerformanceScore',y='Salary',data=dep_sal)
plt.xticks(rotation=45)

png

Interpretation Evidently, the employees that had lowest performance score also had lowest satisfaction and salary. Such a finding requires a deeper analysis, to seek the casual relation - is low satisfaction and salary causes low performance score or vice versa.

Counterproductive work behavior

What is the degree of absenteeism across departments and performance score level?

dep_sal = df.groupby('Department')['Absences'].mean().reset_index()
sns.barplot(x='Department',y='Absences',data=dep_sal)
plt.xticks(rotation=45)

png

dep_sal = df.groupby('PerformanceScore')['Absences'].mean().reset_index()
sns.barplot(x='PerformanceScore',y='Absences',data=dep_sal)
plt.xticks(rotation=45)

png

What is the degree of tardiness across departments and performance score level?

dep_sal = df.groupby('Department')['DaysLateLast30'].mean().reset_index()
sns.barplot(x='Department',y='DaysLateLast30',data=dep_sal)
plt.xticks(rotation=45)

png

dep_sal = df.groupby('PerformanceScore')['DaysLateLast30'].mean().reset_index()
sns.barplot(x='PerformanceScore',y='DaysLateLast30',data=dep_sal)
plt.xticks(rotation=45)

png

Interpretation Data show that Sales and Production Departments have highest level of counterproductive working behaviour, as well group that have lowest performance score (both departments are highly represented in those groups).

Conclusion

The analysis of organizational data reveals key insights. The Sales Department shows a notable need for improvement, suggesting potential areas for management attention and motivation enhancement. Conversely, the Executive Office experiences a significant drop in satisfaction despite having the highest salary, indicating the influence of non-monetary factors on job satisfaction. The correlation between low performance scores, lower satisfaction, and salary levels prompts further investigation into causal relationships. Additionally, Sales and Production Departments exhibit higher levels of counterproductive behavior, particularly in groups with lower performance scores, emphasizing the need for targeted strategies to mitigate such behaviors. Overall, these findings provide a foundation for focused interventions to improve organizational performance, employee satisfaction, and well-being.