We introduce a new photometric estimator of the HI mass fraction (M_HI/M_*) in local galaxies, which is a linear combination of four parameters: stellar mass, stellar surface mass density, NUV-r colour, and g-i colour gradient. It is calibrated using samples of nearby galaxies (0.025<z<0.05) with HI line detections from the GASS and ALFALFA surveys, and it is demonstrated to provide unbiased M_HI/M_* estimates even for HI-rich galaxies. We apply this estimator to a sample of ~24,000 galaxies from the SDSS/DR7 in the same redshift range. We then bin these galaxies by stellar mass and HI mass fraction and compute projected two point cross-correlation functions with respect to a reference galaxy sample. Results are compared with predictions from current semi-analytic models of galaxy formation. The agreement is good for galaxies with stellar masses larger than 10^10 M_sun, but not for lower mass systems. We then extend the analysis by studying the bias in the clustering of HI-poor or HI-rich galaxies with respect to galaxies with normal HI content on scales between 100 kpc and ~5 Mpc. For the HI-deficient population, the strongest bias effects arise when the HI-deficiency is defined in comparison to galaxies of the same stellar mass and size. This is not reproduced by the semi-analytic models, where the quenching of star formation in satellites occurs by 'starvation' and does not depend on their internal structure. HI-rich galaxies with masses greater than 10^10 M_sun are found to be anti-biased compared to galaxies with 'normal' HI content. Interestingly, no such effect is found for lower mass galaxies.