Dataset statistics
Number of variables | 11 |
---|---|
Number of observations | 880848 |
Missing cells | 709050 |
Missing cells (%) | 7.3% |
Duplicate rows | 1907 |
Duplicate rows (%) | 0.2% |
Total size in memory | 73.9 MiB |
Average record size in memory | 88.0 B |
Variable types
Categorical | 7 |
---|---|
Numeric | 3 |
Unsupported | 1 |
Dataset has 1907 (0.2%) duplicate rows | Duplicates |
month has a high cardinality: 393 distinct values | High cardinality |
block has a high cardinality: 2616 distinct values | High cardinality |
street_name has a high cardinality: 579 distinct values | High cardinality |
floor_area_sqm is highly correlated with lease_commence_date and 1 other fields | High correlation |
lease_commence_date is highly correlated with floor_area_sqm and 1 other fields | High correlation |
resale_price is highly correlated with floor_area_sqm and 1 other fields | High correlation |
floor_area_sqm is highly correlated with resale_price | High correlation |
lease_commence_date is highly correlated with resale_price | High correlation |
resale_price is highly correlated with floor_area_sqm and 1 other fields | High correlation |
flat_type is highly correlated with flat_model | High correlation |
flat_model is highly correlated with flat_type | High correlation |
town is highly correlated with flat_model and 1 other fields | High correlation |
flat_type is highly correlated with floor_area_sqm and 1 other fields | High correlation |
floor_area_sqm is highly correlated with flat_type and 3 other fields | High correlation |
flat_model is highly correlated with town and 4 other fields | High correlation |
lease_commence_date is highly correlated with town and 3 other fields | High correlation |
resale_price is highly correlated with floor_area_sqm and 2 other fields | High correlation |
remaining_lease has 709050 (80.5%) missing values | Missing |
remaining_lease is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Reproduction
Analysis started | 2022-11-05 03:40:26.524991 |
---|---|
Analysis finished | 2022-11-05 03:40:40.617995 |
Duration | 14.09 seconds |
Software version | pandas-profiling v3.1.0 |
Download configuration | config.json |
Distinct | 393 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 6.7 MiB |
1999-03 | 6465 |
---|---|
1999-06 | 5861 |
1998-10 | 5709 |
1999-04 | 5698 |
1999-05 | 5671 |
Other values (388) |
Length
Max length | 7 |
---|---|
Median length | 7 |
Mean length | 7 |
Min length | 7 |
Characters and Unicode
Total characters | 0 |
---|---|
Distinct characters | 0 |
Distinct categories | 0 ? |
Distinct scripts | 0 ? |
Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 1990-01 |
---|---|
2nd row | 1990-01 |
3rd row | 1990-01 |
4th row | 1990-01 |
5th row | 1990-01 |
Common Values
Value | Count | Frequency (%) |
1999-03 | 6465 | 0.7% |
1999-06 | 5861 | 0.7% |
1998-10 | 5709 | 0.6% |
1999-04 | 5698 | 0.6% |
1999-05 | 5671 | 0.6% |
1999-07 | 5493 | 0.6% |
1999-08 | 5209 | 0.6% |
1998-11 | 4993 | 0.6% |
1998-12 | 4988 | 0.6% |
1999-02 | 4834 | 0.5% |
Other values (383) | 825927 |
Length
Histogram of lengths of the category
Value | Count | Frequency (%) |
1999-03 | 6465 | 0.7% |
1999-06 | 5861 | 0.7% |
1998-10 | 5709 | 0.6% |
1999-04 | 5698 | 0.6% |
1999-05 | 5671 | 0.6% |
1999-07 | 5493 | 0.6% |
1999-08 | 5209 | 0.6% |
1998-11 | 4993 | 0.6% |
1998-12 | 4988 | 0.6% |
1999-02 | 4834 | 0.5% |
Other values (383) | 825927 |
Most occurring characters
Value | Count | Frequency (%) |
No values found. |
Most occurring categories
Value | Count | Frequency (%) |
No values found. |
Most frequent character per category
Most occurring scripts
Value | Count | Frequency (%) |
No values found. |
Most frequent character per script
Most occurring blocks
Value | Count | Frequency (%) |
No values found. |
Most frequent character per block
Distinct | 27 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 6.7 MiB |
TAMPINES | |
---|---|
YISHUN | |
BEDOK | |
JURONG WEST | |
WOODLANDS | |
Other values (22) |
Length
Max length | 15 |
---|---|
Median length | 9 |
Mean length | 9.029695248 |
Min length | 5 |
Characters and Unicode
Total characters | 0 |
---|---|
Distinct characters | 0 |
Distinct categories | 0 ? |
Distinct scripts | 0 ? |
Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | JURONG WEST |
---|---|
2nd row | JURONG EAST |
3rd row | JURONG EAST |
4th row | JURONG EAST |
5th row | JURONG EAST |
Common Values
Value | Count | Frequency (%) |
TAMPINES | 76922 | 8.7% |
YISHUN | 66779 | 7.6% |
BEDOK | 64355 | 7.3% |
JURONG WEST | 63704 | 7.2% |
WOODLANDS | 61874 | 7.0% |
ANG MO KIO | 50307 | 5.7% |
HOUGANG | 48257 | 5.5% |
BUKIT BATOK | 41969 | 4.8% |
CHOA CHU KANG | 36180 | 4.1% |
BUKIT MERAH | 32650 | 3.7% |
Other values (17) | 337851 |
Length
Histogram of lengths of the category
Value | Count | Frequency (%) |
bukit | 103322 | 7.8% |
jurong | 87623 | 6.6% |
tampines | 76922 | 5.8% |
yishun | 66779 | 5.1% |
bedok | 64355 | 4.9% |
west | 63704 | 4.8% |
woodlands | 61874 | 4.7% |
ang | 50307 | 3.8% |
mo | 50307 | 3.8% |
kio | 50307 | 3.8% |
Other values (27) | 646239 |
Most occurring characters
Value | Count | Frequency (%) |
No values found. |
Most occurring categories
Value | Count | Frequency (%) |
No values found. |
Most frequent character per category
Most occurring scripts
Value | Count | Frequency (%) |
No values found. |
Most frequent character per script
Most occurring blocks
Value | Count | Frequency (%) |
No values found. |
Most frequent character per block
Distinct | 8 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 6.7 MiB |
4 ROOM | |
---|---|
3 ROOM | |
5 ROOM | |
EXECUTIVE | |
2 ROOM | 10632 |
Other values (3) | 1823 |
Length
Max length | 16 |
---|---|
Median length | 6 |
Mean length | 6.233662334 |
Min length | 6 |
Characters and Unicode
Total characters | 0 |
---|---|
Distinct characters | 0 |
Distinct categories | 0 ? |
Distinct scripts | 0 ? |
Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 3 ROOM |
---|---|
2nd row | 4 ROOM |
3rd row | 4 ROOM |
4th row | 4 ROOM |
5th row | 4 ROOM |
Common Values
Value | Count | Frequency (%) |
4 ROOM | 332214 | |
3 ROOM | 284617 | |
5 ROOM | 184725 | |
EXECUTIVE | 66837 | 7.6% |
2 ROOM | 10632 | 1.2% |
1 ROOM | 1292 | 0.1% |
MULTI GENERATION | 279 | < 0.1% |
MULTI-GENERATION | 252 | < 0.1% |
Length
Histogram of lengths of the category
Pie chart
Value | Count | Frequency (%) |
room | 813480 | |
4 | 332214 | |
3 | 284617 | 16.8% |
5 | 184725 | 10.9% |
executive | 66837 | 3.9% |
2 | 10632 | 0.6% |
1 | 1292 | 0.1% |
multi | 279 | < 0.1% |
generation | 279 | < 0.1% |
multi-generation | 252 | < 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
No values found. |
Most occurring categories
Value | Count | Frequency (%) |
No values found. |
Most frequent character per category
Most occurring scripts
Value | Count | Frequency (%) |
No values found. |
Most frequent character per script
Most occurring blocks
Value | Count | Frequency (%) |
No values found. |
Most frequent character per block
Distinct | 2616 |
---|---|
Distinct (%) | 0.3% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 6.7 MiB |
2 | 4475 |
---|---|
1 | 3919 |
110 | 3307 |
101 | 3301 |
4 | 3245 |
Other values (2611) |
Length
Max length | 4 |
---|---|
Median length | 3 |
Mean length | 2.918779403 |
Min length | 1 |
Characters and Unicode
Total characters | 0 |
---|---|
Distinct characters | 0 |
Distinct categories | 0 ? |
Distinct scripts | 0 ? |
Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 8 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | 172 |
---|---|
2nd row | 322 |
3rd row | 218 |
4th row | 408 |
5th row | 251 |
Common Values
Value | Count | Frequency (%) |
2 | 4475 | 0.5% |
1 | 3919 | 0.4% |
110 | 3307 | 0.4% |
101 | 3301 | 0.4% |
4 | 3245 | 0.4% |
8 | 3208 | 0.4% |
113 | 3192 | 0.4% |
107 | 3121 | 0.4% |
3 | 3111 | 0.4% |
114 | 3096 | 0.4% |
Other values (2606) | 846873 |
Length
Histogram of lengths of the category
Value | Count | Frequency (%) |
2 | 4475 | 0.5% |
1 | 3919 | 0.4% |
110 | 3307 | 0.4% |
101 | 3301 | 0.4% |
4 | 3245 | 0.4% |
8 | 3208 | 0.4% |
113 | 3192 | 0.4% |
107 | 3121 | 0.4% |
3 | 3111 | 0.4% |
114 | 3096 | 0.4% |
Other values (2606) | 846873 |
Most occurring characters
Value | Count | Frequency (%) |
No values found. |
Most occurring categories
Value | Count | Frequency (%) |
No values found. |
Most frequent character per category
Most occurring scripts
Value | Count | Frequency (%) |
No values found. |
Most frequent character per script
Most occurring blocks
Value | Count | Frequency (%) |
No values found. |
Most frequent character per block
Distinct | 579 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 6.7 MiB |
YISHUN RING RD | 16891 |
---|---|
BEDOK RESERVOIR RD | 14282 |
ANG MO KIO AVE 10 | 13401 |
ANG MO KIO AVE 3 | 11833 |
HOUGANG AVE 8 | 9036 |
Other values (574) |
Length
Max length | 22 |
---|---|
Median length | 14 |
Mean length | 14.01041042 |
Min length | 7 |
Characters and Unicode
Total characters | 0 |
---|---|
Distinct characters | 0 |
Distinct categories | 0 ? |
Distinct scripts | 0 ? |
Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 1 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | BOON LAY DR |
---|---|
2nd row | JURONG EAST ST 31 |
3rd row | JURONG EAST ST 21 |
4th row | PANDAN GDNS |
5th row | JURONG EAST ST 24 |
Common Values
Value | Count | Frequency (%) |
YISHUN RING RD | 16891 | 1.9% |
BEDOK RESERVOIR RD | 14282 | 1.6% |
ANG MO KIO AVE 10 | 13401 | 1.5% |
ANG MO KIO AVE 3 | 11833 | 1.3% |
HOUGANG AVE 8 | 9036 | 1.0% |
TAMPINES ST 21 | 8042 | 0.9% |
BEDOK NTH ST 3 | 7340 | 0.8% |
BEDOK NTH RD | 7246 | 0.8% |
ANG MO KIO AVE 4 | 7013 | 0.8% |
MARSILING DR | 6444 | 0.7% |
Other values (569) | 779320 |
Length
Histogram of lengths of the category
Value | Count | Frequency (%) |
st | 275144 | 9.7% |
ave | 224615 | 7.9% |
rd | 149428 | 5.3% |
west | 70430 | 2.5% |
dr | 68235 | 2.4% |
tampines | 68075 | 2.4% |
yishun | 66779 | 2.4% |
jurong | 63628 | 2.2% |
1 | 53029 | 1.9% |
bedok | 52407 | 1.9% |
Other values (320) | 1738152 |
Most occurring characters
Value | Count | Frequency (%) |
No values found. |
Most occurring categories
Value | Count | Frequency (%) |
No values found. |
Most frequent character per category
Most occurring scripts
Value | Count | Frequency (%) |
No values found. |
Most frequent character per script
Most occurring blocks
Value | Count | Frequency (%) |
No values found. |
Most frequent character per block
storey_range
Categorical
Distinct | 25 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 6.7 MiB |
04 TO 06 | |
---|---|
07 TO 09 | |
01 TO 03 | |
10 TO 12 | |
13 TO 15 | |
Other values (20) |
Length
Max length | 8 |
---|---|
Median length | 8 |
Mean length | 8 |
Min length | 8 |
Characters and Unicode
Total characters | 0 |
---|---|
Distinct characters | 0 |
Distinct categories | 0 ? |
Distinct scripts | 0 ? |
Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 04 TO 06 |
---|---|
2nd row | 07 TO 09 |
3rd row | 10 TO 12 |
4th row | 01 TO 03 |
5th row | 04 TO 06 |
Common Values
Value | Count | Frequency (%) |
04 TO 06 | 222011 | |
07 TO 09 | 200273 | |
01 TO 03 | 178379 | |
10 TO 12 | 170193 | |
13 TO 15 | 57640 | 6.5% |
16 TO 18 | 22092 | 2.5% |
19 TO 21 | 10516 | 1.2% |
22 TO 24 | 6848 | 0.8% |
25 TO 27 | 3027 | 0.3% |
01 TO 05 | 2700 | 0.3% |
Other values (15) | 7169 | 0.8% |
Length
Histogram of lengths of the category
Value | Count | Frequency (%) |
to | 880848 | |
06 | 224485 | 8.5% |
04 | 222011 | 8.4% |
07 | 200273 | 7.6% |
09 | 200273 | 7.6% |
01 | 181079 | 6.9% |
03 | 178379 | 6.8% |
10 | 172667 | 6.5% |
12 | 170193 | 6.4% |
15 | 58899 | 2.2% |
Other values (30) | 153437 | 5.8% |
Most occurring characters
Value | Count | Frequency (%) |
No values found. |
Most occurring categories
Value | Count | Frequency (%) |
No values found. |
Most frequent character per category
Most occurring scripts
Value | Count | Frequency (%) |
No values found. |
Most frequent character per script
Most occurring blocks
Value | Count | Frequency (%) |
No values found. |
Most frequent character per block
Distinct | 209 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 95.70626726 |
Minimum | 28 |
---|---|
Maximum | 307 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 6.7 MiB |
Quantile statistics
Minimum | 28 |
---|---|
5-th percentile | 60 |
Q1 | 73 |
median | 93 |
Q3 | 113 |
95-th percentile | 145 |
Maximum | 307 |
Range | 279 |
Interquartile range (IQR) | 40 |
Descriptive statistics
Standard deviation | 25.93102825 |
---|---|
Coefficient of variation (CV) | 0.270943889 |
Kurtosis | -0.3630683132 |
Mean | 95.70626726 |
Median Absolute Deviation (MAD) | 20 |
Skewness | 0.3703632387 |
Sum | 84302674.1 |
Variance | 672.4182262 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
67 | 66600 | 7.6% |
104 | 45675 | 5.2% |
68 | 37494 | 4.3% |
84 | 35086 | 4.0% |
121 | 28465 | 3.2% |
92 | 27785 | 3.2% |
73 | 27374 | 3.1% |
91 | 25908 | 2.9% |
65 | 25776 | 2.9% |
103 | 25617 | 2.9% |
Other values (199) | 535068 |
Value | Count | Frequency (%) |
28 | 33 | < 0.1% |
29 | 420 | |
31 | 839 | |
34 | 75 | < 0.1% |
35 | 22 | < 0.1% |
37 | 12 | < 0.1% |
38 | 154 | < 0.1% |
39 | 146 | < 0.1% |
40 | 744 | |
41 | 368 |
Value | Count | Frequency (%) |
307 | 1 | < 0.1% |
297 | 2 | < 0.1% |
280 | 4 | < 0.1% |
266 | 4 | < 0.1% |
261 | 6 | < 0.1% |
259 | 2 | < 0.1% |
250 | 3 | < 0.1% |
249 | 3 | < 0.1% |
246 | 2 | < 0.1% |
243 | 16 |
Distinct | 21 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 6.7 MiB |
MODEL A | |
---|---|
IMPROVED | |
NEW GENERATION | |
SIMPLIFIED | |
PREMIUM APARTMENT | |
Other values (16) |
Length
Max length | 22 |
---|---|
Median length | 8 |
Mean length | 9.653708699 |
Min length | 4 |
Characters and Unicode
Total characters | 0 |
---|---|
Distinct characters | 0 |
Distinct categories | 0 ? |
Distinct scripts | 0 ? |
Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | IMPROVED |
---|---|
2nd row | NEW GENERATION |
3rd row | NEW GENERATION |
4th row | NEW GENERATION |
5th row | NEW GENERATION |
Common Values
Value | Count | Frequency (%) |
MODEL A | 246730 | |
IMPROVED | 230874 | |
NEW GENERATION | 183648 | |
SIMPLIFIED | 55842 | 6.3% |
PREMIUM APARTMENT | 41949 | 4.8% |
STANDARD | 41426 | 4.7% |
APARTMENT | 34070 | 3.9% |
MAISONETTE | 28570 | 3.2% |
MODEL A2 | 9631 | 1.1% |
DBSS | 2785 | 0.3% |
Other values (11) | 5323 | 0.6% |
Length
Histogram of lengths of the category
Value | Count | Frequency (%) |
model | 258361 | |
a | 246730 | |
improved | 230874 | |
generation | 184179 | |
new | 183648 | |
apartment | 76111 | 5.6% |
simplified | 55842 | 4.1% |
premium | 42127 | 3.1% |
standard | 41426 | 3.0% |
maisonette | 28656 | 2.1% |
Other values (14) | 19407 | 1.4% |
Most occurring characters
Value | Count | Frequency (%) |
No values found. |
Most occurring categories
Value | Count | Frequency (%) |
No values found. |
Most frequent character per category
Most occurring scripts
Value | Count | Frequency (%) |
No values found. |
Most frequent character per script
Most occurring blocks
Value | Count | Frequency (%) |
No values found. |
Most frequent character per block
Distinct | 54 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1987.764496 |
Minimum | 1966 |
---|---|
Maximum | 2019 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 6.7 MiB |
Quantile statistics
Minimum | 1966 |
---|---|
5-th percentile | 1973 |
Q1 | 1980 |
median | 1986 |
Q3 | 1995 |
95-th percentile | 2005 |
Maximum | 2019 |
Range | 53 |
Interquartile range (IQR) | 15 |
Descriptive statistics
Standard deviation | 10.12490047 |
---|---|
Coefficient of variation (CV) | 0.005093611687 |
Kurtosis | 0.1887217129 |
Mean | 1987.764496 |
Median Absolute Deviation (MAD) | 7 |
Skewness | 0.5673714838 |
Sum | 1750918381 |
Variance | 102.5136095 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
1985 | 84941 | 9.6% |
1984 | 61582 | 7.0% |
1988 | 49161 | 5.6% |
1987 | 41559 | 4.7% |
1978 | 40185 | 4.6% |
1986 | 37495 | 4.3% |
1989 | 31215 | 3.5% |
1980 | 31144 | 3.5% |
1979 | 30090 | 3.4% |
1997 | 29729 | 3.4% |
Other values (44) | 443747 |
Value | Count | Frequency (%) |
1966 | 30 | < 0.1% |
1967 | 5988 | 0.7% |
1968 | 1838 | 0.2% |
1969 | 8160 | |
1970 | 11090 | |
1971 | 7687 | |
1972 | 5649 | 0.6% |
1973 | 8378 | |
1974 | 14133 | |
1975 | 16872 |
Value | Count | Frequency (%) |
2019 | 43 | < 0.1% |
2018 | 1219 | 0.1% |
2017 | 3474 | |
2016 | 4998 | |
2015 | 8307 | |
2014 | 3043 | 0.3% |
2013 | 4568 | |
2012 | 4248 | |
2011 | 2349 | 0.3% |
2010 | 1201 | 0.1% |
Distinct | 9005 |
---|---|
Distinct (%) | 1.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 307367.7417 |
Minimum | 5000 |
---|---|
Maximum | 1418000 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 6.7 MiB |
Quantile statistics
Minimum | 5000 |
---|---|
5-th percentile | 87035 |
Q1 | 188000 |
median | 286000 |
Q3 | 400000 |
95-th percentile | 600000 |
Maximum | 1418000 |
Range | 1413000 |
Interquartile range (IQR) | 212000 |
Descriptive statistics
Standard deviation | 159259.3219 |
---|---|
Coefficient of variation (CV) | 0.51813935 |
Kurtosis | 1.233728232 |
Mean | 307367.7417 |
Median Absolute Deviation (MAD) | 105500 |
Skewness | 0.8818419284 |
Sum | 2.707442605 × 1011 |
Variance | 2.53635316 × 1010 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
300000 | 6752 | 0.8% |
280000 | 6503 | 0.7% |
350000 | 6494 | 0.7% |
320000 | 6334 | 0.7% |
250000 | 6262 | 0.7% |
260000 | 5980 | 0.7% |
400000 | 5957 | 0.7% |
380000 | 5872 | 0.7% |
360000 | 5865 | 0.7% |
330000 | 5818 | 0.7% |
Other values (8995) | 819011 |
Value | Count | Frequency (%) |
5000 | 1 | < 0.1% |
5600 | 1 | < 0.1% |
5700 | 1 | < 0.1% |
5800 | 1 | < 0.1% |
6000 | 5 | < 0.1% |
6700 | 1 | < 0.1% |
7000 | 13 | |
7300 | 31 | |
7500 | 14 | |
7600 | 1 | < 0.1% |
Value | Count | Frequency (%) |
1418000 | 1 | |
1400000 | 1 | |
1388888.88 | 1 | |
1380000 | 1 | |
1360000 | 1 | |
1350000 | 1 | |
1348888 | 1 | |
1338888 | 1 | |
1328000 | 1 | |
1310000 | 1 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.
First rows
month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | resale_price | remaining_lease | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1990-01 | JURONG WEST | 3 ROOM | 172 | BOON LAY DR | 04 TO 06 | 70.0 | IMPROVED | 1974 | 23400.0 | NaN |
1 | 1990-01 | JURONG EAST | 4 ROOM | 322 | JURONG EAST ST 31 | 07 TO 09 | 96.0 | NEW GENERATION | 1985 | 70000.0 | NaN |
2 | 1990-01 | JURONG EAST | 4 ROOM | 218 | JURONG EAST ST 21 | 10 TO 12 | 91.0 | NEW GENERATION | 1984 | 72000.0 | NaN |
3 | 1990-01 | JURONG EAST | 4 ROOM | 408 | PANDAN GDNS | 01 TO 03 | 97.0 | NEW GENERATION | 1978 | 60000.0 | NaN |
4 | 1990-01 | JURONG EAST | 4 ROOM | 251 | JURONG EAST ST 24 | 04 TO 06 | 90.0 | NEW GENERATION | 1982 | 72000.0 | NaN |
5 | 1990-01 | JURONG EAST | 4 ROOM | 251 | JURONG EAST ST 24 | 04 TO 06 | 90.0 | NEW GENERATION | 1982 | 87000.0 | NaN |
6 | 1990-01 | JURONG EAST | 4 ROOM | 232 | JURONG EAST ST 21 | 07 TO 09 | 98.0 | NEW GENERATION | 1982 | 71000.0 | NaN |
7 | 1990-01 | JURONG EAST | 4 ROOM | 316 | JURONG EAST ST 32 | 04 TO 06 | 104.0 | MODEL A | 1984 | 88000.0 | NaN |
8 | 1990-01 | JURONG EAST | 4 ROOM | 316 | JURONG EAST ST 32 | 10 TO 12 | 123.0 | MODEL A | 1984 | 75000.0 | NaN |
9 | 1990-01 | JURONG EAST | 4 ROOM | 316 | JURONG EAST ST 32 | 01 TO 03 | 105.0 | MODEL A | 1984 | 88000.0 | NaN |
Last rows
month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | resale_price | remaining_lease | |
---|---|---|---|---|---|---|---|---|---|---|---|
880838 | 2022-09 | GEYLANG | 3 ROOM | 91 | PAYA LEBAR WAY | 10 TO 12 | 67.0 | IMPROVED | 1972 | 328000.0 | 49 years 01 month |
880839 | 2022-09 | GEYLANG | 3 ROOM | 60 | CIRCUIT RD | 04 TO 06 | 56.0 | STANDARD | 1969 | 320000.0 | 45 years 05 months |
880840 | 2022-09 | CHOA CHU KANG | EXECUTIVE | 134 | TECK WHYE LANE | 10 TO 12 | 144.0 | APARTMENT | 1993 | 738000.0 | 69 years 10 months |
880841 | 2022-09 | CHOA CHU KANG | 5 ROOM | 453 | CHOA CHU KANG AVE 4 | 07 TO 09 | 123.0 | PREMIUM APARTMENT | 2000 | 525000.0 | 76 years 06 months |
880842 | 2022-09 | CHOA CHU KANG | 4 ROOM | 5 | TECK WHYE AVE | 04 TO 06 | 104.0 | MODEL A | 1984 | 451000.0 | 60 years 08 months |
880843 | 2022-09 | CHOA CHU KANG | 4 ROOM | 816B | KEAT HONG LINK | 13 TO 15 | 92.0 | MODEL A | 2017 | 565000.0 | 94 years 01 month |
880844 | 2022-09 | CHOA CHU KANG | 4 ROOM | 691A | CHOA CHU KANG CRES | 16 TO 18 | 91.0 | MODEL A | 2003 | 465000.0 | 79 years 09 months |
880845 | 2022-09 | CHOA CHU KANG | 4 ROOM | 684D | CHOA CHU KANG CRES | 13 TO 15 | 90.0 | MODEL A | 2002 | 460000.0 | 79 years 01 month |
880846 | 2022-09 | CHOA CHU KANG | 4 ROOM | 487C | CHOA CHU KANG AVE 5 | 13 TO 15 | 93.0 | MODEL A | 2016 | 520000.0 | 92 years 10 months |
880847 | 2022-09 | YISHUN | 5 ROOM | 677C | YISHUN RING RD | 04 TO 06 | 113.0 | IMPROVED | 2018 | 655000.0 | 95 years 02 months |
Most frequently occurring
month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | resale_price | # duplicates | |
---|---|---|---|---|---|---|---|---|---|---|---|
1231 | 2009-02 | PASIR RIS | 3 ROOM | 5 | CHANGI VILLAGE RD | 01 TO 03 | 66.0 | IMPROVED | 1981 | 177000.0 | 5 |
1228 | 2009-01 | PASIR RIS | 3 ROOM | 5 | CHANGI VILLAGE RD | 04 TO 06 | 66.0 | IMPROVED | 1981 | 177000.0 | 4 |
1233 | 2009-02 | PASIR RIS | 3 ROOM | 5 | CHANGI VILLAGE RD | 04 TO 06 | 66.0 | IMPROVED | 1981 | 177000.0 | 4 |
1240 | 2009-04 | PASIR RIS | 3 ROOM | 5 | CHANGI VILLAGE RD | 04 TO 06 | 66.0 | IMPROVED | 1981 | 177000.0 | 4 |
48 | 1990-08 | TOA PAYOH | 3 ROOM | 195 | KIM KEAT AVE | 04 TO 06 | 66.0 | IMPROVED | 1973 | 40000.0 | 3 |
66 | 1990-12 | ANG MO KIO | 4 ROOM | 334 | ANG MO KIO AVE 1 | 07 TO 09 | 91.0 | NEW GENERATION | 1982 | 80000.0 | 3 |
72 | 1990-12 | TOA PAYOH | 3 ROOM | 195 | KIM KEAT AVE | 07 TO 09 | 66.0 | IMPROVED | 1973 | 40000.0 | 3 |
86 | 1991-02 | ANG MO KIO | 3 ROOM | 343 | ANG MO KIO AVE 3 | 04 TO 06 | 73.0 | NEW GENERATION | 1978 | 47000.0 | 3 |
180 | 1992-09 | CLEMENTI | 3 ROOM | 714 | CLEMENTI WEST ST 2 | 07 TO 09 | 67.0 | NEW GENERATION | 1980 | 47000.0 | 3 |
332 | 1995-09 | BEDOK | 3 ROOM | 609 | BEDOK RESERVOIR RD | 04 TO 06 | 67.0 | NEW GENERATION | 1982 | 120000.0 | 3 |