Web pages nowadays have different forms and types of content. When the Web content is considered, they are in the form of pictures, videos, audio files, and text files in different languages. The content can be multilingual, heterogeneous, and unstructured. The mining should be independent of the language and software. Statistical features of the images are extracted from the pixel map of the image. The extracted features are presented to the fuzzy clustering algorithm (FCM) and Gath–Geva algorithm. The similarity metric being Euclidean distance and Gaussian distance, respectively. The accuracy is compared and presented. © Springer India 2015.