检测文件类型工具类

业务中需要较为准确的判断一个文件的类型, 查阅资料使用了 apache 的 tika 的 parser 获取 contentType, 然后自行判断文件的类型.

记录一下自己写的工具类

package com.relic;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.HttpHeaders;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.metadata.TikaMetadataKeys;
import org.apache.tika.parser.AutoDetectParser;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.regex.Pattern;
 * <p> @date: 2020-10-28 15:35</p>
 * @author Lesible
public class FileTypeDetector {
    private static final Logger log = LoggerFactory.getLogger(FileTypeDetector.class);
    private static final Pattern IMG_PATTERN = Pattern.compile("image/*");
    public static String detectFileType(File file) {
        if (file == null || !file.exists()) {
            throw new IllegalArgumentException("file must exist");
        AutoDetectParser parser = new AutoDetectParser();
        Metadata metadata = new Metadata();
        metadata.set(TikaMetadataKeys.RESOURCE_NAME_KEY, file.getName());
        try (InputStream inputStream = new FileInputStream(file)) {
            parser.parse(inputStream, new DefaultHandler(), metadata);
        } catch (IOException e) {
            throw new RuntimeException("file does not exist/close inputStream failed");
        } catch (TikaException | SAXException e) {
            throw new RuntimeException("parse failed");
        return metadata.get(HttpHeaders.CONTENT_TYPE);
    public static boolean isImageType(File file) {
        return IMG_PATTERN.matcher(detectFileType(file)).find();
    public static void main(String[] args) {
        File file = new File("E:\\Users\\Lesible\\Pictures\\Saved Pictures\\centos.png");
        log.info("file : {}, is this a image file? {}", file.getName(), isImageType(file));

    结果很完美,即使是改了后缀名的文件也可以正确识别文件的类型.

分类:
后端
标签: