Overview

Dataset statistics

Number of variables11
Number of observations880848
Missing cells709050
Missing cells (%)7.3%
Duplicate rows1907
Duplicate rows (%)0.2%
Total size in memory73.9 MiB
Average record size in memory88.0 B

Variable types

Categorical7
Numeric3
Unsupported1

Alerts

Dataset has 1907 (0.2%) duplicate rowsDuplicates
month has a high cardinality: 393 distinct values High cardinality
block has a high cardinality: 2616 distinct values High cardinality
street_name has a high cardinality: 579 distinct values High cardinality
floor_area_sqm is highly correlated with lease_commence_date and 1 other fieldsHigh correlation
lease_commence_date is highly correlated with floor_area_sqm and 1 other fieldsHigh correlation
resale_price is highly correlated with floor_area_sqm and 1 other fieldsHigh correlation
floor_area_sqm is highly correlated with resale_priceHigh correlation
lease_commence_date is highly correlated with resale_priceHigh correlation
resale_price is highly correlated with floor_area_sqm and 1 other fieldsHigh correlation
flat_type is highly correlated with flat_modelHigh correlation
flat_model is highly correlated with flat_typeHigh correlation
town is highly correlated with flat_model and 1 other fieldsHigh correlation
flat_type is highly correlated with floor_area_sqm and 1 other fieldsHigh correlation
floor_area_sqm is highly correlated with flat_type and 3 other fieldsHigh correlation
flat_model is highly correlated with town and 4 other fieldsHigh correlation
lease_commence_date is highly correlated with town and 3 other fieldsHigh correlation
resale_price is highly correlated with floor_area_sqm and 2 other fieldsHigh correlation
remaining_lease has 709050 (80.5%) missing values Missing
remaining_lease is an unsupported type, check if it needs cleaning or further analysis Unsupported

Reproduction

Analysis started2022-11-05 03:40:26.524991
Analysis finished2022-11-05 03:40:40.617995
Duration14.09 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

month
Categorical

HIGH CARDINALITY

Distinct393
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.7 MiB
1999-03
 
6465
1999-06
 
5861
1998-10
 
5709
1999-04
 
5698
1999-05
 
5671
Other values (388)
851444 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1990-01
2nd row1990-01
3rd row1990-01
4th row1990-01
5th row1990-01

Common Values

ValueCountFrequency (%)
1999-036465
 
0.7%
1999-065861
 
0.7%
1998-105709
 
0.6%
1999-045698
 
0.6%
1999-055671
 
0.6%
1999-075493
 
0.6%
1999-085209
 
0.6%
1998-114993
 
0.6%
1998-124988
 
0.6%
1999-024834
 
0.5%
Other values (383)825927
93.8%

Length

2022-11-05T11:40:40.644149image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1999-036465
 
0.7%
1999-065861
 
0.7%
1998-105709
 
0.6%
1999-045698
 
0.6%
1999-055671
 
0.6%
1999-075493
 
0.6%
1999-085209
 
0.6%
1998-114993
 
0.6%
1998-124988
 
0.6%
1999-024834
 
0.5%
Other values (383)825927
93.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

town
Categorical

HIGH CORRELATION

Distinct27
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.7 MiB
TAMPINES
76922 
YISHUN
66779 
BEDOK
64355 
JURONG WEST
63704 
WOODLANDS
61874 
Other values (22)
547214 

Length

Max length15
Median length9
Mean length9.029695248
Min length5

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJURONG WEST
2nd rowJURONG EAST
3rd rowJURONG EAST
4th rowJURONG EAST
5th rowJURONG EAST

Common Values

ValueCountFrequency (%)
TAMPINES76922
 
8.7%
YISHUN66779
 
7.6%
BEDOK64355
 
7.3%
JURONG WEST63704
 
7.2%
WOODLANDS61874
 
7.0%
ANG MO KIO50307
 
5.7%
HOUGANG48257
 
5.5%
BUKIT BATOK41969
 
4.8%
CHOA CHU KANG36180
 
4.1%
BUKIT MERAH32650
 
3.7%
Other values (17)337851
38.4%

Length

2022-11-05T11:40:40.684621image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
bukit103322
 
7.8%
jurong87623
 
6.6%
tampines76922
 
5.8%
yishun66779
 
5.1%
bedok64355
 
4.9%
west63704
 
4.8%
woodlands61874
 
4.7%
ang50307
 
3.8%
mo50307
 
3.8%
kio50307
 
3.8%
Other values (27)646239
48.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

flat_type
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.7 MiB
4 ROOM
332214 
3 ROOM
284617 
5 ROOM
184725 
EXECUTIVE
66837 
2 ROOM
 
10632
Other values (3)
 
1823

Length

Max length16
Median length6
Mean length6.233662334
Min length6

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3 ROOM
2nd row4 ROOM
3rd row4 ROOM
4th row4 ROOM
5th row4 ROOM

Common Values

ValueCountFrequency (%)
4 ROOM332214
37.7%
3 ROOM284617
32.3%
5 ROOM184725
21.0%
EXECUTIVE66837
 
7.6%
2 ROOM10632
 
1.2%
1 ROOM1292
 
0.1%
MULTI GENERATION279
 
< 0.1%
MULTI-GENERATION252
 
< 0.1%

Length

2022-11-05T11:40:40.723698image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-11-05T11:40:40.751656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
room813480
48.0%
4332214
19.6%
3284617
 
16.8%
5184725
 
10.9%
executive66837
 
3.9%
210632
 
0.6%
11292
 
0.1%
multi279
 
< 0.1%
generation279
 
< 0.1%
multi-generation252
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

block
Categorical

HIGH CARDINALITY

Distinct2616
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.7 MiB
2
 
4475
1
 
3919
110
 
3307
101
 
3301
4
 
3245
Other values (2611)
862601 

Length

Max length4
Median length3
Mean length2.918779403
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)< 0.1%

Sample

1st row172
2nd row322
3rd row218
4th row408
5th row251

Common Values

ValueCountFrequency (%)
24475
 
0.5%
13919
 
0.4%
1103307
 
0.4%
1013301
 
0.4%
43245
 
0.4%
83208
 
0.4%
1133192
 
0.4%
1073121
 
0.4%
33111
 
0.4%
1143096
 
0.4%
Other values (2606)846873
96.1%

Length

2022-11-05T11:40:40.795206image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
24475
 
0.5%
13919
 
0.4%
1103307
 
0.4%
1013301
 
0.4%
43245
 
0.4%
83208
 
0.4%
1133192
 
0.4%
1073121
 
0.4%
33111
 
0.4%
1143096
 
0.4%
Other values (2606)846873
96.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

street_name
Categorical

HIGH CARDINALITY

Distinct579
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size6.7 MiB
YISHUN RING RD
 
16891
BEDOK RESERVOIR RD
 
14282
ANG MO KIO AVE 10
 
13401
ANG MO KIO AVE 3
 
11833
HOUGANG AVE 8
 
9036
Other values (574)
815405 

Length

Max length22
Median length14
Mean length14.01041042
Min length7

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowBOON LAY DR
2nd rowJURONG EAST ST 31
3rd rowJURONG EAST ST 21
4th rowPANDAN GDNS
5th rowJURONG EAST ST 24

Common Values

ValueCountFrequency (%)
YISHUN RING RD16891
 
1.9%
BEDOK RESERVOIR RD14282
 
1.6%
ANG MO KIO AVE 1013401
 
1.5%
ANG MO KIO AVE 311833
 
1.3%
HOUGANG AVE 89036
 
1.0%
TAMPINES ST 218042
 
0.9%
BEDOK NTH ST 37340
 
0.8%
BEDOK NTH RD7246
 
0.8%
ANG MO KIO AVE 47013
 
0.8%
MARSILING DR6444
 
0.7%
Other values (569)779320
88.5%

Length

2022-11-05T11:40:40.840651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
st275144
 
9.7%
ave224615
 
7.9%
rd149428
 
5.3%
west70430
 
2.5%
dr68235
 
2.4%
tampines68075
 
2.4%
yishun66779
 
2.4%
jurong63628
 
2.2%
153029
 
1.9%
bedok52407
 
1.9%
Other values (320)1738152
61.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

storey_range
Categorical

Distinct25
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.7 MiB
04 TO 06
222011 
07 TO 09
200273 
01 TO 03
178379 
10 TO 12
170193 
13 TO 15
57640 
Other values (20)
52352 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row04 TO 06
2nd row07 TO 09
3rd row10 TO 12
4th row01 TO 03
5th row04 TO 06

Common Values

ValueCountFrequency (%)
04 TO 06222011
25.2%
07 TO 09200273
22.7%
01 TO 03178379
20.3%
10 TO 12170193
19.3%
13 TO 1557640
 
6.5%
16 TO 1822092
 
2.5%
19 TO 2110516
 
1.2%
22 TO 246848
 
0.8%
25 TO 273027
 
0.3%
01 TO 052700
 
0.3%
Other values (15)7169
 
0.8%

Length

2022-11-05T11:40:40.881868image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
to880848
33.3%
06224485
 
8.5%
04222011
 
8.4%
07200273
 
7.6%
09200273
 
7.6%
01181079
 
6.9%
03178379
 
6.8%
10172667
 
6.5%
12170193
 
6.4%
1558899
 
2.2%
Other values (30)153437
 
5.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

floor_area_sqm
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct209
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean95.70626726
Minimum28
Maximum307
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.7 MiB
2022-11-05T11:40:40.922785image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum28
5-th percentile60
Q173
median93
Q3113
95-th percentile145
Maximum307
Range279
Interquartile range (IQR)40

Descriptive statistics

Standard deviation25.93102825
Coefficient of variation (CV)0.270943889
Kurtosis-0.3630683132
Mean95.70626726
Median Absolute Deviation (MAD)20
Skewness0.3703632387
Sum84302674.1
Variance672.4182262
MonotonicityNot monotonic
2022-11-05T11:40:40.968696image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6766600
 
7.6%
10445675
 
5.2%
6837494
 
4.3%
8435086
 
4.0%
12128465
 
3.2%
9227785
 
3.2%
7327374
 
3.1%
9125908
 
2.9%
6525776
 
2.9%
10325617
 
2.9%
Other values (199)535068
60.7%
ValueCountFrequency (%)
2833
 
< 0.1%
29420
< 0.1%
31839
0.1%
3475
 
< 0.1%
3522
 
< 0.1%
3712
 
< 0.1%
38154
 
< 0.1%
39146
 
< 0.1%
40744
0.1%
41368
< 0.1%
ValueCountFrequency (%)
3071
 
< 0.1%
2972
 
< 0.1%
2804
 
< 0.1%
2664
 
< 0.1%
2616
 
< 0.1%
2592
 
< 0.1%
2503
 
< 0.1%
2493
 
< 0.1%
2462
 
< 0.1%
24316
< 0.1%

flat_model
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.7 MiB
MODEL A
246730 
IMPROVED
230874 
NEW GENERATION
183648 
SIMPLIFIED
55842 
PREMIUM APARTMENT
41949 
Other values (16)
121805 

Length

Max length22
Median length8
Mean length9.653708699
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIMPROVED
2nd rowNEW GENERATION
3rd rowNEW GENERATION
4th rowNEW GENERATION
5th rowNEW GENERATION

Common Values

ValueCountFrequency (%)
MODEL A246730
28.0%
IMPROVED230874
26.2%
NEW GENERATION183648
20.8%
SIMPLIFIED55842
 
6.3%
PREMIUM APARTMENT41949
 
4.8%
STANDARD41426
 
4.7%
APARTMENT34070
 
3.9%
MAISONETTE28570
 
3.2%
MODEL A29631
 
1.1%
DBSS2785
 
0.3%
Other values (11)5323
 
0.6%

Length

2022-11-05T11:40:41.017407image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
model258361
18.9%
a246730
18.0%
improved230874
16.9%
generation184179
13.5%
new183648
13.4%
apartment76111
 
5.6%
simplified55842
 
4.1%
premium42127
 
3.1%
standard41426
 
3.0%
maisonette28656
 
2.1%
Other values (14)19407
 
1.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

lease_commence_date
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct54
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1987.764496
Minimum1966
Maximum2019
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.7 MiB
2022-11-05T11:40:41.063276image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1966
5-th percentile1973
Q11980
median1986
Q31995
95-th percentile2005
Maximum2019
Range53
Interquartile range (IQR)15

Descriptive statistics

Standard deviation10.12490047
Coefficient of variation (CV)0.005093611687
Kurtosis0.1887217129
Mean1987.764496
Median Absolute Deviation (MAD)7
Skewness0.5673714838
Sum1750918381
Variance102.5136095
MonotonicityNot monotonic
2022-11-05T11:40:41.116166image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
198584941
 
9.6%
198461582
 
7.0%
198849161
 
5.6%
198741559
 
4.7%
197840185
 
4.6%
198637495
 
4.3%
198931215
 
3.5%
198031144
 
3.5%
197930090
 
3.4%
199729729
 
3.4%
Other values (44)443747
50.4%
ValueCountFrequency (%)
196630
 
< 0.1%
19675988
 
0.7%
19681838
 
0.2%
19698160
0.9%
197011090
1.3%
19717687
0.9%
19725649
 
0.6%
19738378
1.0%
197414133
1.6%
197516872
1.9%
ValueCountFrequency (%)
201943
 
< 0.1%
20181219
 
0.1%
20173474
0.4%
20164998
0.6%
20158307
0.9%
20143043
 
0.3%
20134568
0.5%
20124248
0.5%
20112349
 
0.3%
20101201
 
0.1%

resale_price
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct9005
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean307367.7417
Minimum5000
Maximum1418000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.7 MiB
2022-11-05T11:40:41.166699image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum5000
5-th percentile87035
Q1188000
median286000
Q3400000
95-th percentile600000
Maximum1418000
Range1413000
Interquartile range (IQR)212000

Descriptive statistics

Standard deviation159259.3219
Coefficient of variation (CV)0.51813935
Kurtosis1.233728232
Mean307367.7417
Median Absolute Deviation (MAD)105500
Skewness0.8818419284
Sum2.707442605 × 1011
Variance2.53635316 × 1010
MonotonicityNot monotonic
2022-11-05T11:40:41.215897image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3000006752
 
0.8%
2800006503
 
0.7%
3500006494
 
0.7%
3200006334
 
0.7%
2500006262
 
0.7%
2600005980
 
0.7%
4000005957
 
0.7%
3800005872
 
0.7%
3600005865
 
0.7%
3300005818
 
0.7%
Other values (8995)819011
93.0%
ValueCountFrequency (%)
50001
 
< 0.1%
56001
 
< 0.1%
57001
 
< 0.1%
58001
 
< 0.1%
60005
 
< 0.1%
67001
 
< 0.1%
700013
< 0.1%
730031
< 0.1%
750014
< 0.1%
76001
 
< 0.1%
ValueCountFrequency (%)
14180001
< 0.1%
14000001
< 0.1%
1388888.881
< 0.1%
13800001
< 0.1%
13600001
< 0.1%
13500001
< 0.1%
13488881
< 0.1%
13388881
< 0.1%
13280001
< 0.1%
13100001
< 0.1%

remaining_lease
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing709050
Missing (%)80.5%
Memory size6.7 MiB

Interactions

2022-11-05T11:40:37.862664image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-05T11:40:36.842764image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-05T11:40:37.398093image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-05T11:40:38.011707image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-05T11:40:37.006836image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-05T11:40:37.549213image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-05T11:40:38.161489image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-05T11:40:37.217853image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-05T11:40:37.709218image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-11-05T11:40:41.258800image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-05T11:40:41.307814image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-05T11:40:41.355104image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-05T11:40:41.406482image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-11-05T11:40:41.454551image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-05T11:40:38.409074image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-05T11:40:39.321740image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-11-05T11:40:40.324989image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

monthtownflat_typeblockstreet_namestorey_rangefloor_area_sqmflat_modellease_commence_dateresale_priceremaining_lease
01990-01JURONG WEST3 ROOM172BOON LAY DR04 TO 0670.0IMPROVED197423400.0NaN
11990-01JURONG EAST4 ROOM322JURONG EAST ST 3107 TO 0996.0NEW GENERATION198570000.0NaN
21990-01JURONG EAST4 ROOM218JURONG EAST ST 2110 TO 1291.0NEW GENERATION198472000.0NaN
31990-01JURONG EAST4 ROOM408PANDAN GDNS01 TO 0397.0NEW GENERATION197860000.0NaN
41990-01JURONG EAST4 ROOM251JURONG EAST ST 2404 TO 0690.0NEW GENERATION198272000.0NaN
51990-01JURONG EAST4 ROOM251JURONG EAST ST 2404 TO 0690.0NEW GENERATION198287000.0NaN
61990-01JURONG EAST4 ROOM232JURONG EAST ST 2107 TO 0998.0NEW GENERATION198271000.0NaN
71990-01JURONG EAST4 ROOM316JURONG EAST ST 3204 TO 06104.0MODEL A198488000.0NaN
81990-01JURONG EAST4 ROOM316JURONG EAST ST 3210 TO 12123.0MODEL A198475000.0NaN
91990-01JURONG EAST4 ROOM316JURONG EAST ST 3201 TO 03105.0MODEL A198488000.0NaN

Last rows

monthtownflat_typeblockstreet_namestorey_rangefloor_area_sqmflat_modellease_commence_dateresale_priceremaining_lease
8808382022-09GEYLANG3 ROOM91PAYA LEBAR WAY10 TO 1267.0IMPROVED1972328000.049 years 01 month
8808392022-09GEYLANG3 ROOM60CIRCUIT RD04 TO 0656.0STANDARD1969320000.045 years 05 months
8808402022-09CHOA CHU KANGEXECUTIVE134TECK WHYE LANE10 TO 12144.0APARTMENT1993738000.069 years 10 months
8808412022-09CHOA CHU KANG5 ROOM453CHOA CHU KANG AVE 407 TO 09123.0PREMIUM APARTMENT2000525000.076 years 06 months
8808422022-09CHOA CHU KANG4 ROOM5TECK WHYE AVE04 TO 06104.0MODEL A1984451000.060 years 08 months
8808432022-09CHOA CHU KANG4 ROOM816BKEAT HONG LINK13 TO 1592.0MODEL A2017565000.094 years 01 month
8808442022-09CHOA CHU KANG4 ROOM691ACHOA CHU KANG CRES16 TO 1891.0MODEL A2003465000.079 years 09 months
8808452022-09CHOA CHU KANG4 ROOM684DCHOA CHU KANG CRES13 TO 1590.0MODEL A2002460000.079 years 01 month
8808462022-09CHOA CHU KANG4 ROOM487CCHOA CHU KANG AVE 513 TO 1593.0MODEL A2016520000.092 years 10 months
8808472022-09YISHUN5 ROOM677CYISHUN RING RD04 TO 06113.0IMPROVED2018655000.095 years 02 months

Duplicate rows

Most frequently occurring

monthtownflat_typeblockstreet_namestorey_rangefloor_area_sqmflat_modellease_commence_dateresale_price# duplicates
12312009-02PASIR RIS3 ROOM5CHANGI VILLAGE RD01 TO 0366.0IMPROVED1981177000.05
12282009-01PASIR RIS3 ROOM5CHANGI VILLAGE RD04 TO 0666.0IMPROVED1981177000.04
12332009-02PASIR RIS3 ROOM5CHANGI VILLAGE RD04 TO 0666.0IMPROVED1981177000.04
12402009-04PASIR RIS3 ROOM5CHANGI VILLAGE RD04 TO 0666.0IMPROVED1981177000.04
481990-08TOA PAYOH3 ROOM195KIM KEAT AVE04 TO 0666.0IMPROVED197340000.03
661990-12ANG MO KIO4 ROOM334ANG MO KIO AVE 107 TO 0991.0NEW GENERATION198280000.03
721990-12TOA PAYOH3 ROOM195KIM KEAT AVE07 TO 0966.0IMPROVED197340000.03
861991-02ANG MO KIO3 ROOM343ANG MO KIO AVE 304 TO 0673.0NEW GENERATION197847000.03
1801992-09CLEMENTI3 ROOM714CLEMENTI WEST ST 207 TO 0967.0NEW GENERATION198047000.03
3321995-09BEDOK3 ROOM609BEDOK RESERVOIR RD04 TO 0667.0NEW GENERATION1982120000.03