mongodump

简介

mongodump is a utility for creating a binary export of the contents of a database. Consider using this utility as part of an effective backup strategy. Use mongodump in conjunction with mongorestore to restore databases.

mongodump can read data from either mongod or mongos instances.

See also

mongorestore, Backup a Sharded Cluster with Database Dumps and MongoDB Backup Methods.

Behavior

mongodump does not dump the content of the local database.

The data format used by mongodump from version 2.2 or later is incompatible with earlier versions of mongod. Do not use recent versions of mongodump to back up older data stores.

Changed in version 3.0.5: For a sharded cluster where the shards are replica sets, mongodump, when run against the mongos instance, no longer prefers reads from secondary members.

Changed in version 2.2: When used in combination with fsync or db.fsyncLock(), mongod will block reads, including those from mongodump, when queued write operation waits behind the fsync lock. Do not use mongodump with db.fsyncLock().

mongodump overwrites output files if they exist in the backup data folder. Before running the mongodump command multiple times, either ensure that you no longer need the files in the output folder (the default is the dump/ folder) or rename the folders or files.

所需访问权限

备份集合

To back up all the databases in a cluster via mongodump, you should have the backup role. The backup role provides the required privileges for backing up all databases. The role confers no additional access, in keeping with the policy of least privilege.

To back up a given database, you must have read access on the database. Several roles provide this access, including the backup role.

To back up the system.profile collection, which is created when you activate database profiling, you must have additional read access on this collection. Several roles provide this access, including the clusterAdmin and dbAdmin roles.

备份用户

Changed in version 2.6.

To back up users and user-defined roles for a given database, you must have access to the admin database. MongoDB stores the user data and role definitions for all databases in the admin database.

Specifically, to back up a given database’s users, you must have the find action on the admin database’s admin.system.users collection. The backup and userAdminAnyDatabase roles both provide this privilege.

To back up the user-defined roles on a database, you must have the find action on the admin database’s admin.system.roles collection. Both the backup and userAdminAnyDatabase roles provide this privilege.

选项

Changed in version 3.0.0: mongodump removed the --dbpath as well as related --directoryperdb and --journal options. You must use mongodump while connected to a mongod instance.

mongodump

--help

Returns information on the options and use of mongodump.

--verbose, -v

Increases the amount of internal reporting returned on standard output or in log files. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.)

--quiet

Runs the mongodump in a quiet mode that attempts to limit the amount of output.

This option suppresses:

    output from database commands
    replication activity
    connection accepted events
    connection closed events

--version

Returns the mongodump release number.

--host <:port>, -h <:port>

Default: localhost:27017

Specifies a resolvable hostname for the mongod to which to connect. By default, the mongodump attempts to connect to a MongoDB instance running on the localhost on port number 27017.

To connect to a replica set, specify the replSetName and a seed list of set members, as in the following:

<replSetName>/<hostname1><:port>,<hostname2><:port>,<...>

You can always connect directly to a single MongoDB instance by specifying the host and port number directly.

Changed in version 3.0.0: If you use IPv6 and use the <address>:<port> format, you must enclose the portion of an address and port combination in brackets (e.g. [<address>]).

--port

Default: 27017

Specifies the TCP port on which the MongoDB instance listens for client connections.

--ipv6

Enables IPv6 support and allows the mongodump to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes disable IPv6 support by default.

--ssl

New in version 2.6.

Enables connection to a mongod or mongos that has TLS/SSL support enabled.

Changed in version 3.0: Most MongoDB distributions now include support for TLS/SSL. See Configure mongod and mongos for TLS/SSL and TLS/SSL Configuration for Clients for more information about TLS/SSL and MongoDB.

--sslCAFile

New in version 2.6.

Specifies the .pem file that contains the root certificate chain from the Certificate Authority. Specify the file name of the .pem file using relative or absolute paths.

Changed in version 3.0: Most MongoDB distributions now include support for TLS/SSL. See Configure mongod and mongos for TLS/SSL and TLS/SSL Configuration for Clients for more information about TLS/SSL and MongoDB.

Warning

For SSL connections (--ssl) to mongod and mongos, if the mongodump runs without the --sslCAFile, mongodump will not attempt to validate the server certificates. This creates a vulnerability to expired mongod and mongos certificates as well as to foreign processes posing as valid mongod or mongos instances. Ensure that you always specify the CA file to validate the server certificates in cases where intrusion is a possibility.

--sslPEMKeyFile

New in version 2.6.

Specifies the .pem file that contains both the TLS/SSL certificate and key. Specify the file name of the .pem file using relative or absolute paths.

This option is required when using the --ssl option to connect to a mongod or mongos that has CAFile enabled without allowConnectionsWithoutCertificates.

Changed in version 3.0: Most MongoDB distributions now include support for TLS/SSL. See Configure mongod and mongos for TLS/SSL and TLS/SSL Configuration for Clients for more information about TLS/SSL and MongoDB.

--sslPEMKeyPassword

New in version 2.6.

Specifies the password to de-crypt the certificate-key file (i.e. --sslPEMKeyFile). Use the --sslPEMKeyPassword option only if the certificate-key file is encrypted. In all cases, the mongodump will redact the password from all logging and reporting output.

If the private key in the PEM file is encrypted and you do not specify the --sslPEMKeyPassword option, the mongodump will prompt for a passphrase. See SSL Certificate Passphrase.

Changed in version 3.0: Most MongoDB distributions now include support for TLS/SSL. See Configure mongod and mongos for TLS/SSL and TLS/SSL Configuration for Clients for more information about TLS/SSL and MongoDB.

--sslCRLFile

New in version 2.6.

Specifies the .pem file that contains the Certificate Revocation List. Specify the file name of the .pem file using relative or absolute paths.

Changed in version 3.0: Most MongoDB distributions now include support for TLS/SSL. See Configure mongod and mongos for TLS/SSL and TLS/SSL Configuration for Clients for more information about TLS/SSL and MongoDB.

--sslAllowInvalidCertificates

New in version 2.6.

Bypasses the validation checks for server certificates and allows the use of invalid certificates. When using the allowInvalidCertificates setting, MongoDB logs as a warning the use of the invalid certificate.

Changed in version 3.0: Most MongoDB distributions now include support for TLS/SSL. See Configure mongod and mongos for TLS/SSL and TLS/SSL Configuration for Clients for more information about TLS/SSL and MongoDB.

--sslAllowInvalidHostnames

New in version 3.0.

Disables the validation of the hostnames in TLS/SSL certificates. Allows mongodump to connect to MongoDB instances if the hostname their certificates do not match the specified hostname.

Changed in version 3.0: Most MongoDB distributions now include support for TLS/SSL. See Configure mongod and mongos for TLS/SSL and TLS/SSL Configuration for Clients for more information about TLS/SSL and MongoDB.

--sslFIPSMode

New in version 2.6.

Directs the mongodump to use the FIPS mode of the installed OpenSSL library. Your system must have a FIPS compliant OpenSSL library to use the --sslFIPSMode option.

Note

FIPS-compatible SSL is available only in MongoDB Enterprise. See Configure MongoDB for FIPS for more information.

--username , -u

Specifies a username with which to authenticate to a MongoDB database that uses authentication. Use in conjunction with the --password and --authenticationDatabase options.

--password , -p

Specifies a password with which to authenticate to a MongoDB database that uses authentication. Use in conjunction with the --username and --authenticationDatabase options.

Changed in version 3.0.0: If you do not specify an argument for --password, mongodump returns an error.

Changed in version 3.0.2: If you wish mongodump to prompt the user for the password, pass the --username option without --password or specify an empty string as the --password value, as in --password "" .

--authenticationDatabase

If you do not specify an authentication database, mongodump assumes that the database specified to export holds the user’s credentials.

--authenticationMechanism

Default: SCRAM-SHA-1

New in version 2.4.

Changed in version 2.6: Added support for the PLAIN and MONGODB-X509 authentication mechanisms.

Changed in version 3.0: Added support for the SCRAM-SHA-1 authentication mechanism. Changed default mechanism to SCRAM-SHA-1.

Specifies the authentication mechanism the mongodump instance uses to authenticate to the mongod or mongos.
Value     Description
SCRAM-SHA-1     RFC 5802 standard Salted Challenge Response Authentication Mechanism using the SHA1 hash function.
MONGODB-CR     MongoDB challenge/response authentication.
MONGODB-X509     MongoDB TLS/SSL certificate authentication.
GSSAPI (Kerberos)     External authentication using Kerberos. This mechanism is available only in MongoDB Enterprise.
PLAIN (LDAP SASL)     External authentication using LDAP. You can also use PLAIN for authenticating in-database users. PLAIN transmits passwords in plain text. This mechanism is available only in MongoDB Enterprise.

--gssapiServiceName

New in version 2.6.

Specify the name of the service using GSSAPI/Kerberos. Only required if the service does not use the default name of mongodb.

This option is available only in MongoDB Enterprise.

--gssapiHostName

New in version 2.6.

Specify the hostname of a service using GSSAPI/Kerberos. Only required if the hostname of a machine does not match the hostname resolved by DNS.

This option is available only in MongoDB Enterprise.

--db , -d

Specifies a database to backup. If you do not specify a database, mongodump copies all databases in this instance into the dump files.

--collection , -c

Specifies a collection to backup. If you do not specify a collection, this option copies all collections in the specified database or instance to the dump files.

--query , -q

Provides a JSON document as a query that optionally limits the documents included in the output of mongodump.

You must enclose the query in single quotes (e.g. ') to ensure that it does not interact with your shell environment.

--forceTableScan

Forces mongodump to scan the data store directly: typically, mongodump saves entries as they appear in the index of the _id field. If you specify a query --query, mongodump will use the most appropriate index to support that query.

Use --forceTableScan to skip the index and scan the data directly. Typically there are two cases where this behavior is preferable to the default:

    If you have key sizes over 800 bytes that would not be present in the _id index.
    Your database uses a custom _id field.

When you run with --forceTableScan, mongodump does not use $snapshot. As a result, the dump produced by mongodump can reflect the state of the database at many different points in time.

Important

Use --forceTableScan with extreme caution and consideration.

--out , -o

Specifies the directory where mongodump will write BSON files for the dumped databases. By default, mongodump saves output files in a directory named dump in the current working directory.

To send the database dump to standard output, specify “-” instead of a path. Write to standard output if you want process the output before saving it, such as to use gzip to compress the dump. When writing standard output, mongodump does not write the metadata that writes in a <dbname>.metadata.json file when writing to files directly.

--repair

Runs a repair option in addition to dumping the database. The repair option changes the behavior of mongodump to only write valid data and exclude data that may be in an invalid state as a result of an improper shutdown or mongod crash.

The --repair option uses aggressive data-recovery algorithms that may produce a large amount of duplication.

--repair is only available for use with mongod instances using the mmapv1 storage engine. You cannot run --repair with mongos or with mongod instances that use the wiredTiger storage engine. To repair data in a mongod instance using wiredTiger use mongod --repair.

--oplog

Creates a file named oplog.bson as part of the mongodump output. The oplog.bson file, located in the top level of the output directory, contains oplog entries that occur during the mongodump operation. This file provides an effective point-in-time snapshot of the state of a mongod instance. To restore to a specific point-in-time backup, use the output created with this option in conjunction with mongorestore --oplogReplay.

Without --oplog, if there are write operations during the dump operation, the dump will not reflect a single moment in time. Changes made to the database during the update process can affect the output of the backup.

--oplog has no effect when running mongodump against a mongos instance to dump the entire contents of a sharded cluster. However, you can use --oplog to dump individual shards.

--oplog only works against nodes that maintain an oplog. This includes all members of a replica set, as well as master nodes in master/slave replication deployments.

--oplog does not dump the oplog collection.

--dumpDbUsersAndRoles

Includes user and role definitions in the database’s dump directory when performing mongodump on a specific database. This option applies only when you specify a database in the --db option. MongoDB always includes user and role definitions when mongodump applies to an entire instance and not just a specific database.

--excludeCollection array of strings

New in version 3.0.0.

Specifies collections to exclude from the output of mongodump output.

--excludeCollectionsWithPrefix array of strings

New in version 3.0.0.

Excludes all collections from the output of mongodump with a specified prefix.

使用

For an overview of mongodump usage, see Back Up and Restore with MongoDB Tools

For an overview of mongorestore, which provides the related inverse functionality, see the mongorestore document.

The following command creates a dump file that contains only the collection named collection in the database named test. In this case the database is running on the local interface on port 27017:

mongodump  --db test --collection collection

In the next example, mongodump creates a database dump located at /opt/backup/mongodump-2011-10-24, from a database running on port 37017 on the host mongodb1.example.net and authenticating using the username user and the password pass, as follows:

mongodump --host mongodb1.example.net --port 37017 --username user --password pass --out /opt/backup/mongodump-2011-10-24

外部资源

  • Backup and its Role in Disaster Recovery White Paper
  • Cloud Backup through MongoDB Cloud Manager
  • Blog Post: Backup vs. Replication, Why you Need Both
  • Backup Service with Ops Manager, an on-premise solution available in MongoDB Enterprise Advanced

翻译

数据定期备份是非常有必要的,数据的重要性不言而喻,地球上的猪都会知道。 mongodb备份方式有三种,下面分别说到:

  1. 文件快照方式
  2. 复制数据文件方式
  3. 使用mongodump方式

一. 备份单台mongodb

1. 文件快照方式

这是最简单的备份方法。但是,需要系统文件支持快照和mongod必须启用journal。如果都符合这两条要求,可以在任何时刻创建快照。 恢复时,确保没有运行mongod,执行快照恢复操作命令,然后启动mongod进程,mongod将重放journal日志。

2. 复制数据文件方式

直接拷贝数据目录下的一切文件。但是在拷贝过程中必须阻止数据文件发生更改。因此需要对数据库加锁,以防止数据写入。

> db.fsyncLock()

上面的命令将阻塞写入操作,并将脏数据刷新到磁盘上,确保数据一致。 然后,拷贝数据文件到备份目录下

# cp -R /data/db/* /backup

文件复制完成后,对数据库进行解锁,允许写操作

> db.fsyncUnlock()

注意: 在执行db.fsyncLock()和db.fsyncUnlock()时,不能关闭当前的shell窗口,否则可能无法连接而需要重新启动mongod服务。

恢复时,确保mongod没有运行,清空数据目录,将备份的数据拷贝到数据目录下,然后启动mongod

# cp -R /backup/* /data/db/
# mongod -f mongod.conf

3. 使用mongodump方式

千万不要fsyncLock与mongodump配合使用,如果数据库被锁定了,mongodump将永远挂起。 使用mongodump备份比较慢,在备份复制集时还有些问题,后续会说到。但是,用来备份单个数据库、集合、子集合还是比较好的方法。

# ./mongodump --help

options:
  --help                                显示帮助信息
  -v [ --verbose ]                      打印出更多信息,如时间等等 -vvvvv
  --version                             打印版本信息
  -h [ --host ] arg                     指定连接的mongodb主机,复制集时设置为<set name>/s1,s2
  --port arg                            指定mongodb端口号,也可以这么指定--host hostname:port
  --ipv6                                启用支持IPv6 support
  -u [ --username ] arg                 用户名
  -p [ --password ] arg                 密码
  --authenticationDatabase arg          user source (defaults to dbname)
  --authenticationMechanism arg (=MONGODB-CR)
                                        authentication mechanism
  --dbpath arg

直接访问mongod的数据库文件,而不是连接到mongodb服务器。需要锁定数据目录,如果mongod当前在访问相同路径将不能使用。也就是说,mongod运行的情况下不能使用--dbpath,mongod未运行的情况下可以直接指定--dbpath

  --directoryperdb                      每个db一个单独的目录,需要指定dbpath 
  --journal                             启用journaling,需要指定dbpath
  -d [ --db ] arg                       指定数据库
  -c [ --collection ] arg               指定集合
  -o [ --out ] arg (=dump)              指定输出目录,"-"表示标准输出
  -q [ --query ] arg                    json查询
  --oplog                               使用oplog来生产时间点快照
  --repair                              尝试恢复崩溃的数据库
  --forceTableScan                      强制表扫描,不使用$snapshot
# mongodump -p 27017

将在当前目录下创建dump目录,备份所有的数据库,所有的数据存储在.bson文件中,可以使用mongodb提供的bsondump工具来检索它。 mongod未运行情况下:

# mongodump --dbpath /data/db

恢复时,使用mongorestore工具恢复

# ./mongorestore --help   //相同部分参数意义参加上面的mongodump
  -v [ --verbose ]                      
  --version                             
  -h [ --host ] arg                     
  --port arg                            
  --ipv6                                
  -u [ --username ] arg                 
  -p [ --password ] arg                 
  --authenticationDatabase arg          
  --authenticationMechanism arg (=MONGODB-CR)
  --dbpath arg                         
  --directoryperdb                      
  --journal 
  -d [ --db ] arg  
  -c [ --collection ] arg  
  --objcheck                            在插入前验证对象,默认启用
  --noobjcheck                          不在插入前验证对象
  --filter arg                          插入前过滤
  --drop                                在插入前删除所有文档
  --oplogReplay                         在恢复时应用oplog
  --oplogLimit arg                      include oplog entries before the 
                                        provided Timestamp (seconds[:ordinal]) 
                                        during the oplog replay; the ordinal 
                                        value is optional
  --keepIndexVersion                    don't upgrade indexes to newest version
  --noOptionsRestore                    don't restore collection options
  --noIndexRestore                      don't restore indexes
  --w arg (=0)                          minimum number of replicas per write

恢复整个数据库:

# mongorestore -p 27017 dump/

恢复到特定的库和集合:

# mongorestore --db ttlsa_com --collection posts dump/old_ttlsa_com/old_posts.bson

注意: mongodump和mongorestore版本最好相匹配。

二. 备份复制集

通常情况下,在secondary进行备份,降低primary负载,只在secondary上锁定,以免影响业务(假设没有发送读请求到secondary)。 可以使用上面的任意方式进行备份,不过推荐使用文件快照方式和复制数据文件方式。 使用mongodump备份,上面提到了一个问题,那就是在mongodump备份过程中,发生写操作。在复制集架构环境下,要避免这种情况发生,mongodump需要加上--oplog参数,来跟踪备份时服务器上发生的所有操作,获取一个pointin-time快照,否则备份的状态将与集群中其他节点不匹配。在恢复时,还必须创建oplog,并指定--oplogReplay参数来应用这些操作,否则恢复的成员将不知道从何处开始同步,从而在某个时间点上与源服务器保持一致。 在备份复制集时,可以设置mongodump连接"setName/s1,s2,s3", 它会自动选择一个可用的secondary进行备份。

备份复制集:

# mongodump -h "ttlsa/10.1.1.155,10.1.1.156,10.1.1.157" --oplog -o /backup/mongodbbackup/

恢复复制集步骤:

1. 将复制集中要恢复的成员移除集群

2. 运行mongorestore --oplogReplay命令

# mongorestore --oplogReplay dump/

3. 创建oplog

> use local
> db.createCollection("oplog.rs", {"capped" : true, "size" : 10000000})

4. 恢复oplog

# mongorestore -d local -c oplog.rs dump/oplog.bson

注意:oplog.bson不位于dump/local/oplog.rs.bson, oplog.bson记录mongodump过程中发生的操作。

  1. 将该节点添加到复制集集群中

复制集的相关操作参见 http://www.ttlsa.com/html/1679.html

三. 备份分片

在分片集群下,不可能在一个时间点上得到一个完整集群状态的快照。当集群越来越大时,从备份恢复整个架构的几率越来越小的。 因此,对于分片集群的备份,只需独自备份config server和复制集。 在对分片集群进行备份与恢复操作前,要关闭balancer。 对于比较小的分片集群,可以直接从mongos来备份与恢复。

在大多数情况下,我们只需要恢复集群中的某个节点。 如果需要恢复整个集群,那你够倒霉的了,整个集群数据丢失可能性比较小的。备份时,直接连接分片集群的mongod而不是通过mongos。

对于比较小型的分片集群,可以直接通过mongodump连接到mongos进行备份,备份的文件将包含config服务器的元数据信息和实际数据。

对于大型的分片集群,备份步骤如下:

1. 关闭balancer

注意:连接到mongos而不是config server实例。

> sh.setBalancerState(false) 或
> sh.stopBalancer() 或
> use config
> db.settings.update( { _id: "balancer" }, { $set : { stopped: true } } , true );

2. 备份集群元数据

使用mongodump备份任意一台config server。

可以直接连接任意一台的config mongod实例,也可以通过mongos连接。

# mongodump --db config

3. 备份shard集群内各个replica set

可并行执行。

4. 启用balancer

注意:连接到mongos而不是config server实例。

> sh.setBalancerState(true) 或
> sh.startBalancer() 或
> use config
> db.settings.update( { _id: "balancer" }, { $set : { stopped: false } } , true );

下篇《mongodb备份与恢复(下)》将提供一个线上的备份脚本(适用于mongodb任何架构)以及增量备份的实现方法。

results matching ""

    No results matching ""