Illegal partition exception in sqoop for incremental imports

Illegal partition exception in sqoop for incremental imports

Sqoop is an amazing tool by apache that is widely used to import/export data between Hadoop and relational databases. I have particularly used this for developing a data-warehouse wherein I was pulling in data in the data-warehouse from a variety of relational databases.

Role of Sqoop in Hadoop

Role of Sqoop in Hadoop

On top of that it’s usually amazingly easy to use sqoop to import or exchange data except for times when it gets difficult to understand what goes behind the scenes, further beyond than what you can see and understand from the logs and yes, that happens (sometimes at-least).

Incremental import in sqoop

Incremental import in sqoop is an amazing feature using which you can import data incrementally. To import data incrementally means to import only the data that has not been imported before. Optimization, you see? Why would someone want to import the same data over and over unless you really need the snapshots.

For incremental imports, I usually write sqoop jobs. Sqoop job is a way to create and save the usual import/export commands. Using the incremental import of-course means you would be re-calling or re-executing the same import/export command with changed value of the parameter used to identify the last value/date-time of the import so that it could continue import from the same point and this value of course gets stored in the sqoop metastore so that it could automatically fetch it from the metastore each time you execute the sqoop job.

Sqoop job for incremental imports

My typical sqoop job for incremental imports from MySQL to HDFS looks like this:

sqoop job --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop --create test_table -- import --driver com.mysql.jdbc.Driver --connectjdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToNull --username USER_NAME --password 'PASSWORD' --table test_table --merge-key id --split-by id --target-dir LOCATION --incremental lastmodified --last-value 0 --check-column updated_at

While I was trying incremental imports with sqoop jobs of the sort I mentioned above,

The first import works alright because “merge-key” argument comes into the picture only when we are importing to a location where data already exists. (We specify the location through –target-dir argument).

Note: It works just alright when we execute it through command line though:

Sqoop job execution

sqoop job --exec test_table

However, I tried executing it through oozie, the job completed with successful status. I had myself dig deeper into the logs only to find out an offended piece of log in Map-reduce logs:
>>> Invoking Sqoop command line now >>>5373 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.5407 [uber-SubtaskRunner] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.6-cdh5.7.05702 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead.5715 [uber-SubtaskRunner] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.5740 [uber-SubtaskRunner] WARN org.apache.sqoop.ConnFactory - Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.5754 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Using default fetchSize of 10005754 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation6091 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM test_table AS t WHERE 1=06098 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM test_table AS t WHERE 1=06118 [uber-SubtaskRunner] INFO org.apache.sqoop.orm.CompilationManager - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hadoop-mapreduce8173 [uber-SubtaskRunner] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-yarn/compile/454902ac78d49b783a1f51b7bfe0a2be/test_table.jar8185 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM test_table AS t WHERE 1=08192 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Incremental import based on column updated_at8192 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Lower bound value: '2016-07-02 17:13:24.0'8192 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Upper bound value: '2016-07-02 17:16:56.0'8194 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of test_table8214 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM test_table AS t WHERE 1=08230 [uber-SubtaskRunner] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies.8716 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation8717 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - BoundingValsQuery: SELECT MIN(id), MAX(id) FROM test_table WHERE ( updated_at >= '2016-07-02 17:13:24.0' AND updated_at < '2016-07-02 17:16:56.0' )8721 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.IntegerSplitter - Split size: 0; Num splits: 4 from: 1 to: 125461 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Transferred 26 bytes in 17.2192 seconds (1.5099 bytes/sec)25471 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Retrieved 1 records.25536 [uber-SubtaskRunner] WARN org.apache.sqoop.mapreduce.ExportJobBase - IOException checking input file header: java.io.EOFException25550 [uber-SubtaskRunner] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies.Heart beatHeart beat70628 [uber-SubtaskRunner] ERROR org.apache.sqoop.tool.ImportTool - Merge MapReduce job failed!70628 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Saving incremental import state to the metastore70831 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Updated data for job: test_table

You see merge map reduce failed?

70628 [uber-SubtaskRunner] ERROR org.apache.sqoop.tool.ImportTool – Merge MapReduce job failed!

Further looked into AM Container logs:

Error: java.lang.IllegalArgumentExceptionat java.nio.ByteBuffer.allocate(ByteBuffer.java:330)at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:51)at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1848)at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1508)at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)Error: java.io.IOException: Illegal partition for 3 (-2)at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1083)at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:715)at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)at org.apache.sqoop.mapreduce.MergeMapperBase.processRecord(MergeMapperBase.java:82)at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:58)at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34)at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)Error: java.lang.IllegalArgumentExceptionat java.nio.ByteBuffer.allocate(ByteBuffer.java:330)at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:51)at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1848)at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1508)at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)Error: java.io.IOException: Illegal partition for 1 (-2)at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1083)at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:715)at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)at org.apache.sqoop.mapreduce.MergeMapperBase.processRecord(MergeMapperBase.java:82)at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:58)at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34)at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)

When it says, Illegal partition for 1 1 here refers to the merge key.

Now, here are a few points to note:

  1. Sqoop import job has no reducers, but Sqoop merge job does have merge job.
  2. The Oozie Sqoop job seems to fail because lack of partition class definition in job configuration.
  3. A possibility is that Oozie Sqoop reuses the jobConf object between Sqoop import and merge job, and thus accidentally sets the number of reducers in merge job to 0. By explicitly setting the number of reducers to 1, we can work this up.

So to explicitly set the number of reducers, this is what your sqoop job looks like:

sqoop job --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop --create test_table -- import -Dmapred.reduce.tasks=1 --connect --driver com.mysql.jdbc.Driver --connectjdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToNull --username USER_NAME --password 'PASSWORD' --table test_table --merge-key id --split-by id --target-dir LOCATION --incremental lastmodified --last-value 0 --check-column updated_at

That is it.
Using the property -Dmapred.reduce.tasks=1 --connect , you can fix the Illegal partition for key problem, however when the data grows, 1 reducer isn’t going to be sufficient for the imports and merges with considerable large sized data so make sure to multiply that number as the data grows.

I particularly feel it is related to how sqoop-config file is shared with Oozie. Though I am yet to figure out the exact reason for it, the workaround is a temporary fix for it. Also, I welcome your views on it since ideally it should work without having to explicitly give it the number of reducers

P.S: Internet is a successfully thriving community from generous contributions of people from across th globe. You can help it thrive too. Please contribute. As a reader, you can be contributing with your valuable feedbacks.
Please drop by a comment or Share to let people know you were here. ๐Ÿ™‚ ๐Ÿ™‚

3 thoughts on “Illegal partition exception in sqoop for incremental imports

Leave a Reply

Your email address will not be published. Required fields are marked *