深入理解HTTP协议的文件上传
Content-Type介绍
Content-Type
实体头部用于指示资源的MIME
类型(Multipurpose Internet Mail Extensions)。MIME
一般称为媒体类型(media type)或是内容类型(content type);是指示文件类型的字符串,与文件一起发送,例如:一个声音文件可能被标记为audio/ogg
,一个图像文件可能是image/png
。例子:
1 |
|
上传文件时的Content-Type
multipart/form-data
和application/octet-stream
是两种不同的HTTPContent-Type
类型,它们分别用于不同的文件上传情况:
multipart/form-data
是一种用于在HTTP请求中传输表单数据和文件的标准方法。使用这个类型时,HTTP请求会被分成多个部分,每个部分包含一个表单字段或文件数据。这些部分会使用特定的分隔符(boundary)分隔开来,以便服务器能够正确地解析请求。
application/octet-stream
是一种通用的MIME
类型,表示二进制数据流。通常用于传输不带任何元数据的二进制数据,比如图像、音频、视频等文件。当使用
application/octet-stream
时,HTTP请求的Body直接包含二进制数据流,而没有其他任何信息。
application/octet-stream
例子
The octet-stream subtype is used to indicate that a body contains arbitrary binary data. which has two optional parameters
TYPE
andPADDING
.
通过HTTP PUT
请求向华为OBS对象存储上传文件时,文件内容就是PUT
请求Body的所有内容
1 |
|
注释:
If you pass afile object
asdata
parameter, aiohttp will stream it to the server automatically. streaming-uploads
Definition of multipart/form-data
In many applications, it is possible for a user to be presented with a form. The user will fill out the form, including information that is typed, generated by user input, or included from files that the user has selected. When the form is filled out已填入, the data from the form is sent from the user to the receiving application. The definition of multipart/form-data
is derived from one of those applications.
HTML常见表单元素:
- 文本框:
<input type="text">
- 密码框:
<input type="password">
- 复选框:
<input type="checkbox">
- 单选框:
<input type="radio">
- 下拉列表:
<select>
- 文本区域:
<textarea>
表单提交时数据可以通过两种方法提交到服务器:
GET
和POST
。GET
方法将表单数据添加到URL的末尾,适用于小量非敏感数据。POST
方法将表单数据包含在HTTP请求体中,适用于大量或敏感数据。表单数据在提交前需要进行编码。HTML表单支持两种编码类型:
application/x-www-form-urlencoded
和multipart/form-data
。前者用于普通表单数据(键值对),后者用于包含文件上传的表单。在
application/x-www-form-urlencoded
格式中,表单数据被编码为 key-value 对:key 和 value 之间用等号=
连接,不同的 key-value 对之间用&
符号分隔。这种格式还会对某些字符进行URL 编码
(也称为百分比编码),例如空格会被编码为+
, 特殊字符@
会被编码为%40
。
A multipart/form-data
body contains a series of parts separated by a boundary
.
Boundary
Parameter ofmultipart/form-data
As with other multipart types, the parts are delimited with aboundary
delimiter, constructed using CRLF, –, and the value of the boundary parameter.Content-Disposition
Header Field for Each Part
Each part MUST contain aContent-Disposition
header field RFC2183 where the disposition性情,布置,处置 type is form-data. TheContent-Disposition
header field MUST also contain an additional parameter of name; the value of the name parameter is the original field name from the form (possibly encoded; see Section 5.1).
In most multipart types, theMIME
header fields in each part are restricted toUS-ASCII
; for compatibility with those systems, file names normally visible to users MAY be encoded using thepercent-encoding
method.Content-Type
Header Field for Each Part
Each part MAY have an (optional)Content-Type
header field, which defaults totext/plain
. If the contents of a file are to be sent, the file data SHOULD be labeled with an appropriate media type, if known, orapplication/octet-stream
.The
Charset
Parameter fortext/plain
Form Data
In the case where the form data is text, the charset parameter for thetext/plain
Content-Type MAY be used to indicate the character encoding used in that part:1
2
3
4
5
6
7--AaB03x
content-disposition: form-data; name="field1"
content-type: text/plain;charset=UTF-8
content-transfer-encoding: quoted-printable
Joe owes =E2=82=AC100.
--AaB03xContent-Transfer-Encoding
用来说明数据的编码方式,以适应不同的传输协议。因为有些传输协议并不设计来处理二进制数据或特殊字符,因此需要使用特定的编码方式,比如Base64或Quoted-Printable,以确保数据可以在发送和接收时保持完整。
例如,发送一个包含非ASCII字符的HTML邮件,需要使用Content-Transfer-Encoding: quoted-printable
来确保所有的字符都可以被正确地传输。如果有附件(比如图像或PDF文件),需要使用Content-Transfer-Encoding: base64
来发送这些二进制文件。Base64
和Quoted-Printable
这两种编码方法的主要目的都是将非ASCII或二进制数据转换为可以在ASCII环境下处理的格式,从而使得这些数据可以通过电子邮件等只支持ASCII的网络协议进行传输。电子邮件最初设计的时候,只针对文本信息的传输。Base64
:一种基于64个可打印字符来表示二进制数据的方法。用于处理二进制数据,特别是那些包含字节对齐区别的复杂数据。Quoted-Printable
:又称可打印引用编码法,主要用于对邮件中的非ASCII字符进行编码。它会将非ASCII字符转换成=
后面跟着两个十六进制数的形式。
Other
Content-
Header Fields
Themultipart/form-data
media type does not support any MIME header fields in parts other thanContent-Type
,Content-Disposition
andContent-Transfer-Encoding
.
multipart/form-data
脚本例子
可以通过python的aiohttp模块来发送Multipart-encoded files:
1 |
|
Wirshark抓取的一次上传文件交互过程如下:
1 |
|
X-Content-Type-Options: nosniff
含义如下:
The X-Content-Type-Options
response HTTP header is a marker used by the server to indicate that the MIME
types advertised in the Content-Type
headers should be followed and not be changed. The header allows you to avoid MIME type sniffing嗅探 by saying that the MIME
types are deliberately故意的 configured.
Percent-Encoding Option:
percent-encoding (as defined in RFC3986) is offered as a possible way of encoding characters in file names that are otherwise disallowed, including non-ASCII characters, spaces, control characters, and so forth诸如此类,等等. The encoding is created replacing each non-ASCII or disallowed character with a sequence, where each byte of the UTF-8 encoding of the character is represented by a percent-sign (%) followed by the (case-insensitive) hexadecimal[ˌheksəˈdesɪml] of that byte.