How to use single key value pair to ssh in cluster

I have to configure hadoop cluster. For that it is required that all systems should be able to ssh each other in passwordless mode. Due to security, I have allowed only key based ssh (no password). There are 5 systems in cluster. I have to generated ...
more »

2017-10-04 08:10 (0) Answers

Read large mongodb data

I have a java application that needs to read a large amount of data from MongoDB 3.2 and transfer it to Hadoop. This batch application is run every 4 hours 6 times a day. Data Specifications: Documents: 80000 at a time (every 4 hours) Size : 3gb ...
more »

2017-09-28 11:09 (1) Answers

Facebook presto vsTeradata presto

I want to use presto as SQL engine over hadoop, to query using Microstrategy. I see there two distributions: Facebook presto and Teradata presto. what are the differences between them, and which one should I use? ...
more »

2017-09-06 16:09 (1) Answers

Data Validation in Hive

In our application we are migrating huge volume of data from teradata to Hive. Need to validate the data between source and target.We are planning to do it using python & pandas dataframe. My queries are 1.Will pandas data-frame can handle ar...
more »

2017-08-14 16:08 (0) Answers

Running "hbase shell" giving error in OSX

Getting following error when trying to run hbase shell in OSX(version: 10.11.4): warning: -J-Dfile.encoding=UTF-8 argument ignored (launched in same VM?) warning: -J-XX:MaxPermSize=1024m argument ignored (launched in same VM?) warning:...
more »

2017-07-25 12:07 (0) Answers

Error in hadoop jobs due to hive query error

Exception: 2017-06-21 22:47:49,993 FATAL ExecMapper (main): org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable org.apache.hadoop.dynamodb.DynamoDBItemWritable@2e17578f at org.apache.hadoop.hive.ql.exec...
more »

2017-06-22 01:06 (3) Answers

MapReduce sort by value in descending order

I'm trying to write in pseudo code a MapReduce task that returns the items sorted in descending order. For example: for the wordcount task, instead of getting: apple 1 banana 3 mango 2 I want the output to be: banana 3 mango 2 apple 1 Any ideas...
more »

2017-06-21 17:06 (1) Answers

proto2 with Spark cannot run

I've a proto file with syntax proto2 Also, I need to use Spark (2.0.2) and HBase. My project is built using Gradle. Right now, when I run my Java code, I get this error: Exception in thread "main" org.apache.spark.SparkException: Job aborted due t...
more »

2017-05-23 01:05 (0) Answers

What is a keytab exactly?

I am trying to understand how Kerberos works and so came across this file called Keytab which, I believe, is used for authentication to the KDC server. Just like every user and service(say Hadoop) in a kerberos realm has a service principal, does ev...
more »

2017-05-09 09:05 (1) Answers

Kerberos authentication for Windows R

I am trying to connect my HDP cluster from RStudio desktop(Windows) using SparkR package. Spark init is failing with no credentials error message which seem to be because of missing kerberos credentials. (Exact error messages can be found below) I a...
more »

2017-05-08 16:05 (1) Answers

Python Dictionary Contains Encoded Values

I have a pandas data frame oParameterData which I have built querying on Hadoop using Hive ODBC connection. I am using it to populate a Python dictionary called oParameter import pyodbc import pandas oConnexionString = 'Driver={ClouderaHive};[...]'...
more »

2017-05-07 06:05 (1) Answers

Data sharing in multiple spark applications

Thanks in advance. I need to work on Spark application(s), where one Spark job will create or prepare data and that data will be shared across multiple spark jobs running in parallel. I tried to find out the solution for this and I came across Apac...
more »

2017-04-24 17:04 (0) Answers

HIVE left join on nearest date

I am trying to join 2 tables in HIVE using a key and the nearest date in the 2 tables at the time of join. For example: Below are the 2 input tables <----------TABLE A-------------> <------------TABLE B------------> A_id A_...
more »

2017-04-14 20:04 (4) Answers

sort_array order by a different column, Hive

I have two columns, one of products, and one of the dates they were bought. I am able to order the dates by applying the sort_array(dates) function, but I want to be able to sort_array(products) by the purchase date. Is there a way to do that in Hiv...
more »

2017-04-14 19:04 (1) Answers