跳转至

Python 判断二进制文件

概要: 利用Linux file命令的接口,判断对应路径是否为二进制文件

创建时间: 2022.05.16 23:28:20

更新时间: 2022.11.09 23:10:16

准备工作

准备好两个文件,一个是二进制图像文件 img.jpg ,另一个是文本文件 text.txt
image.png

Shell Code

Bash
file img.png 
file test.py
image.png
可以看到文本文件与二进制文件的输出差异。
file/encoding.c 页面中,给出了判断是否为文本文件的表格如下
C
#define F 0   /* character never appears in text */
#define T 1   /* character appears in plain ASCII text */
#define I 2   /* character appears in ISO-8859 text */
#define X 3   /* character appears in non-ISO extended ASCII (Mac, IBM PC) */

private char text_chars[256] = {
    /*                  BEL BS HT LF VT FF CR    */
    F, F, F, F, F, F, F, T, T, T, T, T, T, T, F, F,  /* 0x0X */
    /*                              ESC          */
    F, F, F, F, F, F, F, F, F, F, F, T, F, F, F, F,  /* 0x1X */
    T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T,  /* 0x2X */
    T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T,  /* 0x3X */
    T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T,  /* 0x4X */
    T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T,  /* 0x5X */
    T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T,  /* 0x6X */
    T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, F,  /* 0x7X */
    /*            NEL                            */
    X, X, X, X, X, T, X, X, X, X, X, X, X, X, X, X,  /* 0x8X */
    X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X,  /* 0x9X */
    I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I,  /* 0xaX */
    I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I,  /* 0xbX */
    I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I,  /* 0xcX */
    I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I,  /* 0xdX */
    I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I,  /* 0xeX */
    I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I   /* 0xfX */
};

Python Code

根据上面的表格,使用Python可以进行如下实现:

Python
textchars = bytearray({7,8,9,10,12,13,27} | set(range(0x20, 0x100)) - {0x7f})
is_binary_string = lambda bytes: bool(bytes.translate(None, textchars))
使用时,只需要将文件流进行判断即可:
Python
is_binary_string(open('./img.jpg', 'rb').read(1024)) 
is_binary_string(open('./text.txt', 'rb').read(1024))
输出如下
image.png

参考

  1. How can I detect if a file is binary (non-text) in Python? - Stack Overflow
  2. file/encoding.c at f2a6e7cb7db9b5fd86100403df6b2f830c7f22ba · file/file · GitHub