Data Pump

Data Pump Oracle
Data Pump utility in Oracle

Oracle Data Pump was brought in Oracle Database 10g,
as replacement for the original Export and Import utilities.

From Oracle Database 10g, new Data Pump Export (expdp) and Import (impdp) clients that use this interface have been provided. Oracle recommends that customers use these new Data Pump Export and Import clients rather than the Original Export and Original Import clients, since the new utilities have vastly improved performance and greatly enhanced functionality.

Oracle Data Pump provides high speed, parallel, bulk data and metadata movement of Oracle database contents. It’s a server-side replacement for the original Export and Import utilities. A new public interface package, DBMS_DATAPUMP, provides a server-side infrastructure for fast data and metadata movement.

Data Pump will make use of streams_pool_size.

Data Pump is available on the Standard Edition, Enterprise Edition and Personal Edition. However, the parallel capability is only available on Oracle 10g and Oracle 11g Enterprise Editions. Data Pump is included on all the platforms supported by Oracle 10g, Oracle 11g.

Original Export and Import Limitations

  • Design not scalable to large databases
  • Slow unloading and loading of data, no parallelism
  • Difficult to monitor job progress
  • Limited flexibility in object selection
  • No callable API
  • Difficult to maintain
  • Non-restartable
  • Client-side, single threaded execution
  • Limited tuning mechanisms
  • Limited object filtering

The Data Pump system requirements are the same as the standard Oracle Database 10g requirements. Data Pump doesn’t need a lot of additional system or database resources, but the time to extract and treat the information will be dependent on the CPU and memory available on each machine. If system resource consumption becomes an issue while a Data Pump job is executing, the job can be dynamically throttled to reduce the number of execution threads.

Using the Direct Path method of unloading, a single stream of data unload is about 2 times faster than normal Export because the Direct Path API has been modified to be even more efficient. Depending on the level of parallelism, the level of improvement can be much more.

A single stream of data load is 15-45 times faster than normal Import. The reason it is so much faster is that Conventional Import uses only conventional mode inserts, whereas Data Pump Import uses the Direct Path method of loading. As with Export, the job can be parallelized for even more improvement.

Data Pump Features

1.Writes either
– Direct Path unloads
– External tables (part of cluster, has LOB, etc)
2.Command line interface
3.Writing to external tables
6.Checkpoint / Job Restart

  • Job progress recorded in Master Table – All stopped Data Pump jobs can be restarted without loss of data as long as the master table and dump file set remain undisturbed while the job is stopped. It doesn’t matter if the job was stopped voluntarily by a user of if the stoppage was involuntary due to a crash,power outage, etc.
  • May be stopped and restarted later
  • Abnormally terminated job is also restartable
  • Current objects can be skipped on restart if problematic

7.Better Job Monitoring and Control

  • Can detach from and attach to running jobs from any location – Multiple clients can attach to a job to see what is going on. Clients may also detach from an executing job without affecting it.
  • Initial job space estimate and overall percent done – At Export time, the approximate size of the job is estimated before it gets underway. The default method for determining this is to estimate the size of a partition by counting the number of blocks currently allocated to it. If tables have been analyzed, statistics can also be used which should provide a more accurate estimate. The user gets an estimate of how much dump file space will be consumed for the operation.
  • Job state and description – Once the Export begins, the user can get a status of the job by seeing a percentage of how far along he is in the job. He can then extrapolate the time required to get the job completed.
  • Per-worker status showing current object and percent done
  • Enterprise Manager interface available – The jobs can be monitored from any location

8. Interactive Mode for expdp and impdp clients

  • PARALLEL: add or remove workers
  • ADD_FILE: add dump files and wildcard specs
  • STATUS: get detailed per-worker status
  • STOP_JOB {=IMMEDIATE}: stop the job, leaving it restartable, immediate doesn’t wait for workers to finish current work items.
  • START_JOB: Restart a previously stopped job, can change reporting interval
  • KILL_JOB: stop job and delete all its resources, leaving it unrestartable, the master table and dump files are deleted
  • CONTINUE: leave interactive mode, continue logging
  • EXIT: exit client, leave job running

All modes of operation are supported: full, schema, table, tablespace, and transportable tablespace.

9.Dumpfile Set Management

  • Directory based: e.g., DMPDIR:export01.dmp where DMPDIR is external directory
  • Can specify maximum size of each dumpfile
  • Can dynamically add dumpfiles to jobs – If a job ran out of space, can use ADD_FILE command and specify a FILESIZE value
  • Wildcard file specs supported – Wildcard file support makes it easy to spread the I/O load over multiple spindles:
    e.g.: Dumpfile=dmp1dir:full1%u.dmp, dmp2dir:full2%u.dmp
  • Dump file compression of metadata – Metadata is compressed before being written to the dumpfile set COMPRESSION=METADATA_ONLY
  • In Oracle Database 11g, this compression capability has been extended so that we can now compress table data on export. Data Pump compression is an inline operation, so the reduced dumpfile size means a significant savings in disk space.
  • Automatically uncompressed during Import. Data Pump compression is fully inline on the import side as well, so there is no need to uncompress a dumpfile before importing it.
  • Dumpfile set coherency automatically maintained
  • Data Pump supplies encryption options for more flexible and robust security.

10. Network Mode
Data Pump Export and Import both support a network mode in which the job’s source is a remote Oracle instance. This is an overlap of unloading the data, using Export, and loading the data, using Import, so those processes don’t have to be serialized. A database link is used for the network. We don’t have to worry about allocating file space because there are no intermediate dump files.

Network Export
– Unload a remote database to a local dump file set
– Allows export of read-only databases for archiving

Network Import
– Overlap execution of extract and load
– No intermediate dump files

Because Data Pump maintains a Master Control Table and must perform database writes, Data Pump can’t directly Export a Read-only database. Network mode allows the user to export Read-Only databases: The Data Pump Export job runs locally on a read/write instance and extracts the data and metadata from the remote read-only instance. Both network export and import use database links to communicate with the remote source.

First level parallelism is supported for both network export and import. I/O servers do not operate remotely, so second level, intra-partition parallelism is not supported in network operations.

11.Fine-Grained Object Selection

  • All object types are supported – With the new EXCLUDE and INCLUDE parameters, a Data Pump job can include or exclude any type of object and any subset of objects within a type.
  • Exclude parameter: specified object types are excluded from the operation
  • Include parameter: only the specified object types are included
  • Both take an optional name filter for even finer granularity:
    EXCLUDE TABLE: “in (‘FOO’,’BAR’,…)’”
    EXCLUDE=package:”like ‘PAYROLL%’ “
    Would exclude all functions, procedures, and packages with names starting with ‘PAYROLL’ from the job.

Using INCLUDE instead of EXCLUDE above, would include the functions, procedures, and packages with names starting with ‘PAYROLL’.

12.DDL Transformations
Easy with XML, because object metadata is stored as XML in the dump file set,it is easy to apply transformations when DDL is being formed (via XSL-T) during import.

  • REMAP_SCHEMA -> REMAP_SCHEMA provides the old ‘FROMUSER / TOUSER’ capability to change object ownership.
  • REMAP_TABLESPACE -> REMAP_TABLESPACE allows objects to be moved from one tablespace to another. This changes the tablespace definition as well
  • REMAP_DATAFILE -> REMAP_DATAFILE is useful when moving databases across platforms that have different file system semantics.
  • Segment and storage attributes can be suppressed -> The TRANSFORM parameter can also be used so that storage clauses are not generated in the DDL. This is useful if the storage characteristics of the target instance are very different from those of the source.

Data Pump Benefits (advantages over normal export & import)

    • Restartable
    • Improved control
    • Files will created on server, not on client side
    • Parallel execution
    • Automated performance tuning
    • Simplified monitoring
    • Improved object filtering
    • Dump will be compressed
    • Data can be encrypted (in Oracle 11g or later)
    • Remap of data during export or import (in 11g or later)
    • We can export one or more partitions of a table without having to move the entire table (in 11g or later)
    • XML schemas and XMLType columns are supported for both export and import (in 11g or later)
    • Using the Direct Path method of unloading or loading data, a single stream of Data Pump export (unload) is approximately 2 times faster than original Export, because the Direct Path API has been modified to be even more efficient. Depending on the level of parallelism, the level of improvement can be much more.
    • Original Import uses only conventional mode inserts, so a single stream of Data Pump Import is 10-45 times faster than normal Import. As with Export, the job’s single stream can be changed to parallel streams for even more improvement.
    • With Data Pump, it is much easier for the DBA to manage and monitor jobs. During a long-running job, the DBA can monitor a job from multiple locations and know how far along it is, how much there is left to go, what objects are being worked on, etc. The DBA can also affect the job’s operation, i.e. abort it, adjust its resource consumption, and stop it for later restart.
    • Since the jobs are completed much more quickly than before, production systems have less downtime.
    • Data Pump is publicly available as a PL/SQL package (DBMS_DATAPUMP), so customers can write their own data movement utilities if so desired. The metadata capabilities of the Data Pump are also available as a separate PL/SQL package, DBMS_METADATA.
  • While importing, if destination schema is not existed, Data Pump will create the user and import the objects.

Data Pump requires no special tuning. Data Pump runs optimally “out of the box”. Original Export and (especially) Import require careful tuning to achieve optimum results. There are no Data Pump performance tuning parameters other than the ability to dynamically adjust the degree of parallelism.

We can dynamically throttle the number of threads of execution throughout the lifetime of the job. There is an interactive command mode where we can adjust the level of parallelism. For example, we can start up a job during the day with a PARALLEL=2, and then increase it at night to a higher level.

All the Oracle database data types are supported via Data Pump’s two data movement mechanisms, Direct Path and External Tables.

With Data Pump, there is much more flexibility in selecting objects for unload and load operations. We can now unload any subset of database objects (such as functions, packages, and procedures) and reload them on the target platform. Almost all database object types can be excluded or included in an operation using the new Exclude and Include parameters.

We can either use the Command line interface or the Oracle Enterprise Manager web-based GUI interface.

Data Pump handles all the necessary compatibility issues between hardware platforms and operating systems.

Oracle Data Pump supports Oracle Apps 11i.

We can use the “ESTIMATE ONLY” command to see how much disk space is required for the job’s dump file set before we start the operation.

Jobs can be monitored from any location is going on. Clients may also detach from an executing job without affecting it.

Every Data Pump job creates a Master Table in which the entire record of the job is maintained. The Master Table is the directory to the job, so if a job is stopped for any reason, it can be restarted at a later point in time, without losing any data. Whenever Data Pump export or import is running, Oracle will create a table with the JOB_NAME and will be deleted once the job is done. From this table, Oracle will find out how much job has been completed and from where to continue etc.

With Data Pump, it is now possible to change the definition of some objects as they are created at import time. For example, we can remap the source datafile name to the target datafile name in all DDL statements where the source datafile is referenced. This is really useful if we are moving across platforms with different file system syntax.

Data Pump supports the Flashback infrastructure, so we can perform an export and get a dumpfile set that is consistent with a specified point in time or SCN.

Data Pump Vs SQL*Loader
We can use SQL*Loader to load data from external files into tables of an Oracle database. Many customers use SQL*Loader on a daily basis to load files (e.g. financial feeds) into their databases. Data Pump Export and Import may be used less frequently, but for very important tasks, such as migrating between platforms, moving data between development, test, and production databases, logical database backup, and for application deployment throughout a corporation.

Data Pump Vs Transportable Tablespaces
We can use Transportable Tablespaces when we want to move an entire tablespace of data from one Oracle database to another. Transportable Tablespaces allows Oracle data files to be unplugged from a database, moved or copied to another location, and then plugged into another database. Moving data using Transportable Tablespaces can be much faster than performing either an export or import of the same data, because transporting a tablespace only requires the copying of datafiles and integrating the tablespace dictionary information. Even when transporting a tablespace, Data Pump Export and Import are still used to handle the extraction and recreation of the metadata for that tablespace.

The various background processes related to Data Pump are:
Data Pump Master Process (DMnn) : Creates and deletes the master table at the time of export and import. Master table contains the job state and object information.
Coordinates the Data Pump job tasks performed by Data Pump worker processes and handles client interactions. The Data Pump master (control) process is started during job creation and coordinates all tasks performed by the Data Pump job. It handles all client interactions and communication, establishes all job contexts, and coordinates all worker process activities on behalf of the job. Creates the Worker Process.
Data Pump Worker Process (DWnn) : It performs the actual heavy duty work of loading and unloading of data. It maintains the information in master table. The Data Pump worker process is responsible for performing tasks that are assigned by the Data Pump master process, such as the loading and unloading of metadata and data.
•Shadow Process : When client logs in to an Oracle Server the database creates and Oracle process to service Data Pump API.
•Client Process : The client process calls the Data pump API.

Data Pump Disadvantages
•Can’t use UNIX pipes
•Can’t run as SYS (/ as sysdba)

Related Views