Questions tagged [hive]

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible distributed file system. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.

0
votes
0answers
5 views

Accessing Multiple COS instance under the same IBM account

I am trying to access multiple COS instances which are there under the same IBM account. I see that each COS instance is having different accesskey,secret-key. The property in hive.xml(fs.s3a.access....
0
votes
0answers
5 views

Hive Query throwing Semantic exception if partitions in where condition exceed some number(30 in my case) and succeeding if partitons are less than 30

I am running the hive query and getting the below exception .HiveSQLException: Error while compiling statement: FAILED: SemanticException MetaException(message:The arguments for IN should be the ...
1
vote
1answer
7 views

Hive: Cant perform union query with limit

I am trying to run a union all query in hive select * from tabName where col1='val1' and col2 = 'val2' limit 10 union all select * from tabName where col1='val1' and col2 = 'val3' limit 10; but i ...
1
vote
1answer
10 views

How to get Nth string after a pipe delimiter in hive

I have some a table in Hive where in I want to extract 5th component of the string from one of the columns that looks like this - Sample data john:12|doe|google|usa|google.com|newspaper - title - 1 -...
1
vote
1answer
12 views

How can I disable transactions for a Hive table?

I have a Hive table that was originally created as transactional, but I want to disable transactions on the table because they are not actually needed. I tried to disable them using ALTER TABLE, but ...
0
votes
0answers
8 views

query array of struct within array of struct in impala

I have a hive table containing complex type, array of string within array of struct. I tried to query it from impala and extract those in arrays to columns but when I do that it returned the ...
0
votes
0answers
17 views

I want to put elements of list as columns of sql table [on hold]

I am trying to build a table in hive server using python and want that the column names of table are elements of a list. Trying to write the query in python itself and how to specify data type of each ...
0
votes
4answers
27 views

How do i point to default dummy record incase when foreign key is null

I have query in which i join multiple dimension table to create fact table. For cases when foreign key is null i want to point it to default dummy records added in dimension tables so that there is no ...
0
votes
3answers
20 views

Finding data where the value is the same in two columns from the same table

So I have a simple table that holds items we offered to a customer and the items the user actually used, per day. date | offered_name | used_name | h_id ---------------------------------...
0
votes
0answers
11 views

Hive - How to make 5 Minutes Time windows(on timestamp field) and do aggregation(e.g. count) on other fields.?

I'm trying to reduce duplicates in my streaming data every 5 minutes, and would like to aggregate the counts of those records. Timestamp in each record might be same/different. I want to apply dedup ...
-1
votes
1answer
35 views

Concat string rows for each unique ID by a particular order

I want to create a table where each row is a unique ID and the Place and City column consists of all the places and cities a person visited , ordered by the date of visit , either using Pyspark or ...
0
votes
0answers
21 views

Hive on Spark: Insufficient resources

Following this tutorial and configured Spark as execution engine for Hive. However it hangs with each simple query I tried to set spark.executor.memory = 1g; set spark.driver.memory = 1g; set spark....
0
votes
1answer
20 views

pyspark 2.4 cannot create table from sql command Hive support is required to CREATE Hive TABLE

I'm using pyspark 2.4 and I already enabled the HiveSupport: spark = SparkSession.builder.appName("spark").enableHiveSupport().getOrCreate() but when I'm running: spark.sql(""" CREATE TABLE ...
0
votes
0answers
12 views

Inconsistent count results from Apache HIVE

We have the latest Hortonworks's HDP, with Hive version (3.1.0) I have a problem when trying to count the number of rows, on a given condition. The count (*) returns false value when executed side by ...
0
votes
1answer
27 views

How to move 50GB RDBMS data into hadoop and process it? What is the minimum hardware requirement for process 50GB data by using hadoop?

How to move 50GB RDBMS data into hadoop and process it? What is the minimum hardware requirement for process 50GB data by using hadoop ?
0
votes
0answers
30 views

How to select subset of columns based on column priority

Selecting from one Application and inserting into another. The first system has 9 phone number fields, the second has only 4. I don't want to map a static 4 from old to new. For each data record I ...
0
votes
0answers
10 views

SQL Server data transfer to Hadoop

It's possible to transfer SQL Server database to Hadoop 2.6.0 using Sqoop. My SQL Server is in Windows platform and Hadoop is installed in a Ubuntu machine. I use this command to transfer: sqoop ...
1
vote
1answer
42 views

Grouping rows so a column sums to no more than 10 per group

I have a table that looks like: col1 ------ 2 2 3 4 5 6 7 with values sorted in ascending order. I want to assign each row to groups with labels 0,1,...,n so that each group has a total of no more ...
-1
votes
0answers
8 views

How to give access to Presto, Hive and S3 using some API key instead of username and password? [on hold]

I have access to Presto and Hive as they are in the same VPC and access to S3 using aws secret and access keys. I realised that all my clients cannot be in the same VPC(not smart enough) and that I ...
1
vote
1answer
23 views

How to undo ALTER TABLE … ADD PARTITION without deleting data

Let's suppose I have two hive tables, table_1 and table_2. I use: ALTER TABLE table_2 ADD PARTITION (col=val) LOCATION [table_1_location] Now, table_2 will have the data in table_1 at the partition ...
0
votes
0answers
17 views

How to merge existing hourly partitions to daily partition in hive

My requirement is to merge existing hourly partitions to daily partition for all days. my partition column is like, 2019_06_22_00, 2019_06_22_01, 2019_06_22_02, 2019_06_22_03...., 2019_06_22_23 =>...
0
votes
0answers
9 views

I'm having a error while building a Cube in Apache Kylin

See the Logs in the Step one: java.io.IOException: OS command error exit with return code: 127, error message: Logging initialized using configuration in jar:file:/home/jefferson/Developer/apache-...
0
votes
1answer
19 views

Hive performance improvement

I want to join 1TB data of table with another table which also has 1TB of data in hive. Could you please suggest some best practises to follow? I want to know how performance will be improved in hive ...
-1
votes
1answer
27 views

Use scala collection in a hive query

I have a scala array collection. I need to access values in this collection from hive query. Can this be achieved using HiveContext.
0
votes
0answers
16 views

View field types in hive

I have a query: with cte1 as (...), cte2 as (... from cte1) select * from cte2 This returns cte2. However I'd like to see the column types of cte2. Tried: select typeof(field1), typeof(field2) ...
1
vote
1answer
62 views

Skewed Window Function & Hive Source Partitions?

The data I am reading via Spark is highly skewed Hive Table with the following stats. (MIN, 25TH, MEDIAN, 75TH, MAX) via Spark UI: 1506.0 B / 0 232.4 KB / 27288 247.3 KB / 29025 371.0 KB / ...
0
votes
1answer
23 views

Is there a pseudocolumn in Hive/Presto to get the “last modified” timestamp of a given file?

I have an external table in Athena linked to a folder in S3. There are some pseudocolumns in Presto that allows me to get some metadata information about the the files sitting in that folder (for ...
0
votes
1answer
17 views

Unable to cast field creation to bigint

I am trying to union a bunch of tables and some of these fields are of type array of bigint. When I try to cast these in Hive, it keeps giving me a "Expression not in GROUP BY key 'field_name'. My ...
0
votes
0answers
13 views

how to import-all-tables from Mysql to hive using sqoop for particular database in hive?

sqoop import-all-tables into hive with default database works fine but Sqoop import-all-tables into hive specified database is not working. As --hive-database is depreciated how to specify database ...
1
vote
1answer
18 views

select statement inside case statement for impala

I am in need of eliminating the hardcoded values in the case and have developed a lookup table to be used to get the values. The lookup table looks like: serial_no code description type 1 ...
1
vote
1answer
23 views

Hive table data load gives NULL values

Select * from movierating gives NULL values as a Result. I have tried below create table queries: CREATE TABLE movierating(id INT, movieid INT, rating INT, time string); CREATE TABLE movierating(id ...
0
votes
1answer
52 views

Spark SQL: How to convert time string column in “yyyy-MM-dd HH:mm:ss.SSSSSSSSS” format to timestamp preserving nanoseconds?

I am trying to convert a String type column which is having timestamp string in "yyyy-MM-dd HH:mm:ss.SSSSSSSSS" format to Timestamp type. This cast operation should preserve nanosecond values. I ...
1
vote
1answer
25 views

How to concatenate small parquet files in HIVE

How to concatenate small parquet files in HIVE when below are in place. Partitions are created dynamically on HIVE table. Table is EXTERNAL. Solution Tried so far but for ORC files which has bug : ...
1
vote
0answers
15 views

Reading a hive table in pyspark after altering the schema

I added a column to a hive table: ALTER TABLE table_name ADD COLUMNS (new_col string); But when I read the table using pyspark (2.1), I see the old schema. How do I download the updated table?
-1
votes
0answers
17 views

While pushing incremental data into hive tables, which among “Exchange partition” and “Insert overwrite” is efficient?

I am trying to merge incremental data into Hive tables on HDFS. After doing some searching online, I've found this link: Four Steps Strategy to push incremental data I have created steps 1,2,3 ...
0
votes
0answers
10 views

Oozie handling sensitive parameters (passowords)

By default, Oozie will log any parameter. How can I prevent this behavior for sensitive parameters like passwords? I know that some services directly can read from a file like how can i provide ...
0
votes
0answers
14 views

Unable to connect hive to local window machine. Getting Connection error: java.sql.SQLException: Could not open client transport with JDBC Uri

I am trying to make connection between hive-server2 and my local window machine with python.I have Connection string and a keystore file. I am using Jaydebeapi python module to solve this issue. The ...
1
vote
1answer
24 views

How to write a select statement that outputs all key values in all rows

My hive table has a map of none or many key value pairs. I don't even know most of the keys. I want to write a select statement that outputs all key values in all rows. something like select t....
0
votes
1answer
26 views

Is it possible to change a column name in Spark SQL in Hive?

I'm trying to rename a column (date type), but wasn't sure if the syntax was wrong or if this just isn't possible in Spark SQL: ALTER TABLE user.temp_medicalclaims CHANGE vendor_test_id date_service ...
1
vote
1answer
53 views

Replace space between a pattern <>

I have requirement to parse XML tags. But some tags are appeared with blanks like below. So basically I want to remove blank character inside the XML tag using regex. <Employee >< Name&...
0
votes
1answer
18 views

Issue while passing a parameter to HQL

Error : bash: line 1: syntax error near unexpected token `(' While passing varchar(16) as a parameter in HQL. hive --hivevar id_variable_type="${id_variable_type}" -f $HIVE_SCRIPT_DIR/tds_validation....
-1
votes
1answer
23 views

How to create a quarterly (fiscal year) temp table in SQL

I'm working with a large medical claims dataset that spans over 3 years (in Hive). I want to break each year up into quarters, based on one of the column values, date_service Here's roughly what I'...
0
votes
0answers
10 views

Error in Sqoop import command to ingest data from sql server

Sqoop import job is failing on below error: Import failed: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://nameservice1/user/srvc_sqoop/COLABORA already exists. I am ...
2
votes
2answers
35 views

Group timestamped records by consecutive repeating flag

I have a dataset with the following columns: DriverId DateStamp IsDriving WasDriving DistanceSincePrev SecondsSincePrev 1 11/10/2018 08:00 0 0 0 12 ...
0
votes
2answers
77 views

How to find average of a array column based on index in pyspark

I have data as shown below ----------------------------- place | key | weights ---------------------------- amazon | lion | [ 34, 23, 56 ] north | bear | [ 90, 45] amazon | lion ...
1
vote
1answer
19 views

Combine two partitioned tables into one table but into two different partitions

I am new in Hive and appreciate if someone can help with a hive query that I am dealing with. There are two tables A and B with exactly same schema but different datas with 4 partitions. I need to ...
0
votes
1answer
32 views

How to refactor AND and OR in a left outer join as hive doesn't support or in ON clause

I'm running a hive left outer join query which involves AND and OR in ON clause. Hive doesn't support OR in ON clause. How do I rewrite this to run in Hive? If UNION is one of the answer, please note ...
0
votes
0answers
11 views

How to fix MoveTask error in Hive, works on limited scope

EDIT : Solved! Needed to increase value on hive.exec.max.dynamic.partitions further than the 5k I was already using. I'm working on loading a partitioned Hive table. The source table is a ...
1
vote
0answers
15 views

stop auto unpersist of dataframe after writing to hive table

I want to persist a dataframe even after writing to hive table. <change data capture code> df.persist(StorageLevel.MEMORY_AND_DISK) df.count() #count is 100 df.write.mode("append").insertInto("...
0
votes
0answers
15 views

Unable to import data into Hive from SQL Server

I am trying to import a table from SQL server to Hive but it is giving below error: ERROR tool.ImportTool: Import failed: There is no column found in the target table COLABORA. Please ensure that ...