Menu

DCT HASH MATCHING QUALITY FOR RESIZED IMAGES 2

pHash does it’s mathematical operations for every pixels for original image size. Therefore, when the image is resized, the result is slightly different depending on image size. My assumption is that if every image is resized to certain size when the image is bigger than the size, the general matching quality would be better.

I tested the same set of image samples with previous posting, however, because of the speed, the comparison performed for 3644 images.

To find which size is good for normalization, I resized images to 2000, 1500, and 1000 width. And hamming distance between resized image to from 90% to 10%.

Hamming Distance is bigger than 4

normalization size 2000

5000 <

4000 <

3000 <

2000 <

1000 <

< 1000

5000 <

4000 <

3000 <

2000 <

1000 <

< 1000

100%

0

0

0

0

0

1

100%

0.00

0.00

0.00

0.00

0.00

0.01

90%

0

0

0

0

6

17

90%

0.00

0.00

0.00

0.00

0.08

0.23

80%

0

0

0

1

12

19

80%

0.00

0.00

0.00

0.01

0.16

0.25

70%

0

0

0

1

18

36

70%

0.00

0.00

0.00

0.01

0.24

0.48

60%

0

0

0

12

48

87

60%

0.00

0.00

0.00

0.16

0.64

1.16

50%

0

0

3

26

77

141

50%

0.00

0.00

0.04

0.35

1.03

1.89

40%

0

0

9

62

172

272

40%

0.00

0.00

0.12

0.83

2.30

3.64

30%

1

12

54

156

333

475

30%

0.01

0.16

0.72

2.09

4.45

6.35

20%

27

99

246

424

693

851

20%

0.36

1.32

3.29

5.67

9.27

11.38

10%

163

360

753

1093

1442

1636

10%

2.18

4.82

10.07

14.62

19.29

21.89

normalization size 1500

5000 <

4000 <

3000 <

2000 <

1000 <

< 1000

5000 <

4000 <

3000 <

2000 <

1000 <

< 1000

100%

0

0

0

0

0

1

100%

0.00

0.00

0.00

0.00

0.00

0.01

90%

0

0

0

0

2

13

90%

0.00

0.00

0.00

0.00

0.03

0.17

80%

0

0

0

0

7

14

80%

0.00

0.00

0.00

0.00

0.09

0.19

70%

0

0

0

1

15

33

70%

0.00

0.00

0.00

0.01

0.20

0.44

60%

0

0

0

2

25

64

60%

0.00

0.00

0.00

0.03

0.33

0.86

50%

0

0

0

7

46

110

50%

0.00

0.00

0.00

0.09

0.62

1.47

40%

0

0

4

25

123

223

40%

0.00

0.00

0.05

0.33

1.65

2.98

30%

0

0

18

86

247

389

30%

0.00

0.00

0.24

1.15

3.30

5.20

20%

6

27

116

257

520

678

20%

0.08

0.36

1.55

3.44

6.96

9.07

10%

137

308

654

969

1313

1507

10%

1.83

4.12

8.75

12.96

17.57

20.16

normalization size 1000

5000 <

4000 <

3000 <

2000 <

1000 <

< 1000

5000 <

4000 <

3000 <

2000 <

1000 <

< 1000

100%

0

0

0

0

0

1

100%

0.00

0.00

0.00

0.00

0.00

0.01

90%

0

0

0

0

0

11

90%

0.00

0.00

0.00

0.00

0.00

0.15

80%

0

0

0

0

0

7

80%

0.00

0.00

0.00

0.00

0.00

0.09

70%

0

0

0

0

5

23

70%

0.00

0.00

0.00

0.00

0.07

0.31

60%

0

0

0

0

6

45

60%

0.00

0.00

0.00

0.00

0.08

0.60

50%

0

0

0

0

26

90

50%

0.00

0.00

0.00

0.00

0.35

1.20

40%

0

0

0

3

56

156

40%

0.00

0.00

0.00

0.04

0.75

2.09

30%

0

0

2

17

132

274

30%

0.00

0.00

0.03

0.23

1.77

3.67

20%

0

4

39

122

354

512

20%

0.00

0.05

0.52

1.63

4.74

6.85

10%

61

161

406

679

999

1193

10%

0.82

2.15

5.43

9.08

13.36

15.96

Hamming Distance is bigger than 6

normalization size 2000

5000 <

4000 <

3000 <

2000 <

1000 <

< 1000

5000 <

4000 <

3000 <

2000 <

1000 <

< 1000

100%

0

0

0

0

0

0

100%

0.00

0.00

0.00

0.00

0.00

0.00

90%

0

0

0

0

0

1

90%

0.00

0.00

0.00

0.00

0.00

0.01

80%

0

0

0

1

2

3

80%

0.00

0.00

0.00

0.01

0.03

0.04

70%

0

0

0

0

4

11

70%

0.00

0.00

0.00

0.00

0.05

0.15

60%

0

0

0

0

8

21

60%

0.00

0.00

0.00

0.00

0.11

0.28

50%

0

0

0

6

20

46

50%

0.00

0.00

0.00

0.08

0.27

0.62

40%

0

0

4

21

46

94

40%

0.00

0.00

0.05

0.28

0.62

1.26

30%

0

0

11

45

106

175

30%

0.00

0.00

0.15

0.60

1.42

2.34

20%

4

14

63

142

286

381

20%

0.05

0.19

0.84

1.90

3.83

5.10

10%

59

153

347

539

752

869

10%

0.79

2.05

4.64

7.21

10.06

11.63

normalization size 1500

5000 <

4000 <

3000 <

2000 <

1000 <

< 1000

5000 <

4000 <

3000 <

2000 <

1000 <

< 1000

100%

0

0

0

0

0

0

100%

0.00

0.00

0.00

0.00

0.00

0.00

90%

0

0

0

0

0

1

90%

0.00

0.00

0.00

0.00

0.00

0.01

80%

0

0

0

0

0

1

80%

0.00

0.00

0.00

0.00

0.00

0.01

70%

0

0

0

0

2

9

70%

0.00

0.00

0.00

0.00

0.03

0.12

60%

0

0

0

0

6

19

60%

0.00

0.00

0.00

0.00

0.08

0.25

50%

0

0

0

1

10

36

50%

0.00

0.00

0.00

0.01

0.13

0.48

40%

0

0

0

8

28

76

40%

0.00

0.00

0.00

0.11

0.37

1.02

30%

0

0

3

26

81

150

30%

0.00

0.00

0.04

0.35

1.08

2.01

20%

1

4

30

88

221

316

20%

0.01

0.05

0.40

1.18

2.96

4.23

10%

39

99

257

433

639

756

10%

0.52

1.32

3.44

5.79

8.55

10.11

normalization size 1000

5000 <

4000 <

3000 <

2000 <

1000 <

< 1000

5000 <

4000 <

3000 <

2000 <

1000 <

< 1000

100%

0

0

0

0

0

0

100%

0.00

0.00

0.00

0.00

0.00

0.00

90%

0

0

0

0

0

1

90%

0.00

0.00

0.00

0.00

0.00

0.01

80%

0

0

0

0

0

1

80%

0.00

0.00

0.00

0.00

0.00

0.01

70%

0

0

0

0

1

8

70%

0.00

0.00

0.00

0.00

0.01

0.11

60%

0

0

0

0

2

15

60%

0.00

0.00

0.00

0.00

0.03

0.20

50%

0

0

0

0

9

35

50%

0.00

0.00

0.00

0.00

0.12

0.47

40%

0

0

0

0

14

62

40%

0.00

0.00

0.00

0.00

0.19

0.83

30%

0

0

0

4

35

104

30%

0.00

0.00

0.00

0.05

0.47

1.39

20%

0

0

11

39

138

233

20%

0.00

0.00

0.15

0.52

1.85

3.12

10%

14

38

135

270

449

566

10%

0.19

0.51

1.81

3.61

6.01

7.57

Conclusion

According to the test result, in terms of matching percentage, resizing before hashing gives better results; this can be a solution for better matching. However, false positive matching percentage is important.