Questions tagged [pandasql]

pandasql allows you to query pandas DataFrames using SQL syntax. It works similarly to sqldf in R.

1
vote
1answer
41 views

How to append a Modin pandas dataframe to other?

I am working on performing calculations on large files around 6GB each file and came across Modin pandas which I heard optimized compared to pandas. I need to read a CSV file in chunks and perform ...
2
votes
0answers
33 views

How to Insert Huge Pandas Dataframe in MySQL table with Parallel Insert Statement?

I am working on a project where I have to write a data frame with Millions of rows and about 25 columns mostly of numeric type. I am using Pandas DataFrame to SQL Function to dump the dataframe in ...
-1
votes
0answers
32 views

Probability distribution disntance between two columns in Pandas

I have a dataframe of the type: userId | distrib1 | distrib2 | distrib3 ________________________________________ 125 21.2 20.6 1.1 143 19.7 16.2 3.2 426 ...
0
votes
0answers
24 views

COUNT field incorrect or syntax error with pandas read sql query

Trying to use pandas read_sql_query with a formatted sql string throws incorrect or syntax error I tried fudging all kinds of spaces in the string. What am I missing? See the query string below ...
0
votes
1answer
33 views

Python comparing millions of rows and hundreds of columns between two tables from relational DB

Currently our system is in live proving phase. So, we need to check whether the set of tables populated in production are matching with the tables populated in sandbox (test). At the moment we have ...
1
vote
2answers
24 views

Pandas dataframe transpose with column name instead of index throws ValueError

I am trying to show actual column name in json after dataframe has been transposed, below code works for LIMIT 3 in sql but fails if I try LIMIT 5 Any thoughts please? from pandasql import * pysqldf ...
0
votes
1answer
31 views

Pandas dataframe transpose with column name instead of index

I can't seem to figure out how to show actual column name in json after dataframe has been transposed. Any thoughts please? from pandasql import * import pandas as pd pysqldf = lambda q: sqldf(q, ...
0
votes
0answers
26 views

Is there a way to merge 2 data frame in Pandas that would keep the value of df1 intact

trying to merge two data frame in panda "table A" and "table B"."Table A" has 200K rows and table B" has 310K rows. Once merge I want the row of "table A" to stay 200K. I try to use the ...
2
votes
3answers
71 views

best way to use groupby or aggregate in pandas

I have a table called clients and I want to show how many times someone registered or purchased an item based on the userid. The goal is to have a table that shows the sum of registration_complete ...
1
vote
0answers
28 views

Errors writing a dataframe to DB2 using Pandas to_sql

I am trying to load data from a pandas dataframe to an IBM DB2 Data Warehouse environment. The table already exists so I am just appending rows to the table. I have built the dataframe to mirror ...
0
votes
1answer
51 views

How to remove duplicates based on text similarity across rows in Pandas

I have a dataset of news headlines. I'd like to remove duplicate or highly similar headlines based on textual similarity with headlines of the past ten days. For highly similar headlines, I want to ...
0
votes
2answers
37 views

Exit loop in python if SQL query doesn't bring any data

I am new to Python and have been given a task to download data from different Database ( MS SQl and Teradata ). The logic behind my code is as follow : 1: Code picks up data for Vendor from an excel ...
0
votes
1answer
58 views

In Python, using pandasql: query return “Empty DataFrame”

In Python, using pandasql: query return "Empty DataFrame" import pandas as pd import sqlite3 as db import pandasql dataSet = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/...
1
vote
2answers
64 views

summing all amounts by date in respect with individuals

I have this Dataframe df: payout person1 person2 date 1 300.0 LA NaN 2012-02-01 2 500.0 DO NaN 2012-02-01 3 600.0 DO NaN 2012-02-...
2
votes
2answers
51 views

Pandas: Aggregated and Group by - IDE: Pycharm

QQ - IDE: Pycharm - I am using the below Dataframe Sample format Name Business SegmentID Revenue Margin OrderQuantity James Commercial 1001 1500 100 1 ...
0
votes
0answers
27 views

use of local() in pandasql

The pandasql library uses the locals() function to execute SQL in the dataframe. def select_first_50(filename): students = pandas.read_csv(filename) students.rename(columns = lambda ...
0
votes
0answers
26 views

How to use python variables in pandasql?

I want to use Course Identity python variable in the pandasql query, but it is not working. Here is my code: Course_Identity='Mathematics' query=("""SELECT user_id, course_id, count( DISTINCT ...
0
votes
1answer
94 views

What's the equivalent for LISTAGG in Pandas?

I have the following aggregations: def my_agg(x): names = { 'first_seen': x['first_seen'].min(), 'last_seen': x['last_seen'].max(), 'md5s': x['md5s'].tolist(), } ...
0
votes
0answers
50 views

SQL query column doesn't exist

I have seen many answers about similar SQL queries, but for some reason none of the solutions works with my query. I am trying to read my query into a pandas data frame matching them to my (list) ...
0
votes
1answer
16 views

Variable used into a PSQL query

I have a problem with my code python, i'm using Pandasql, and what i want is to use my (enddate) in a query so: enddate = pd.to_datetime(datetime.today()).date() q2 = """SELECT * FROM res_q1 t1 ...
0
votes
0answers
17 views

Python. Append data to existing table with PK's and maintaining current datatypes

I managed to successfully append data to a MySQL table with 2 PK's, however, when doing so, the existing datatypes change to the ones pandas chooses. This is what I am doing: 1- definition to add PK ...
0
votes
0answers
114 views

Update table statement not working in pandasql

I have a dataset with some null values I wanted to update to read missing instead of null I've looked through all the forums and documentation and can't seem to find a response from pandasql ...
2
votes
1answer
29 views

Pandas - How to make a groupment in which a new column is the result of (sum of a column)/(number of itens grouped)?

I need to make some kind of groupment in which a new column (result) is the sum of the values column divided by number of items found? Could anyone help me, please? For example: Table A +-------+--...
0
votes
0answers
21 views

Pandas dataframe column creation using group by and where condition [duplicate]

I have table with column names : user_id, status_type Where status_type can take two distinct values delete or add only. I want to create table with columns: user_id,status_type_add_count,...
1
vote
1answer
302 views

pandasql OperationalError: too many SQL variables

I'm trying to work with pandasql but something is not working properly. I execute the following code: from pandasql import sqldf, load_meat pysqldf = lambda q: sqldf(q, globals()) meat = load_meat() ...
-1
votes
1answer
49 views

pandasql: count occurrences of pairs

I was trying to count the number of matches that A and B have ever played, the dataset looks like this: so the number of matches team1 and team 29 have played is 2 as they each once were HomeTeam and ...
1
vote
1answer
75 views

Pandasql query of a list or dict of dataframes

Given a set of files - each one of which is read into a distinct dataframe - how might a pandasql query reference them? In the following snippet we have a list of dataframes: but the same question ...
2
votes
0answers
122 views

pandas.read_sql is extremely slow on python 3 kernel compared to python 2

I have a simple parameterized select query hitting an Oracle database via pyodbc connection and fetching data in a dataframe via pandas.read_sql. The code is super efficient and fast in Python 2 ...
0
votes
1answer
41 views

overwrite and append pandas data frames on column value

I have a base dataframe df1: id name count 1 a 10 2 b 20 3 c 30 4 d 40 5 e 50 Here I have a new dataframe with updates df2: id name count 1 a ...
0
votes
0answers
16 views

cx_freeze - the exe returns undescriptive error on sqldf

Trying to create exe for my script, i am using pandasql and sqldf to create sql queries against 2 csv files for matching.after the build, when i run the .exe file, s screen with list of errors is ...
0
votes
1answer
89 views

After importing pandasql as sqldf , getting 'module' object is not callable error

I imported the below : import pandasql as sqldf import pandas as pd import numpy as np from pandasql import load_meat, load_births pysqldf = lambda q: sqldf(q, globals())** meat=load_meat() when I ...
0
votes
2answers
257 views

using pd.read_sql() to extract large data (>5 million records) from oracle database, making the sql execution very slow

Initially tried using pd.read_sql(). Then I tried using sqlalchemy, query objects but none of these methods are useful as the sql getting executed for long time and it never ends. I tried using Hints....
2
votes
0answers
91 views

Using dictionary values in pandasql

I have a dictionary that has dataframes in values, something like this - mydict = {'demand': demand_df, 'supply':supply_df, 'prod': prod_df} Then I am using pandasql module to execute a simple query....
4
votes
1answer
2k views

Pandas Merge two rows into a single row based on columns

I have 2 rows that look like these, ------------------------------ DealName | Target | Acquirer | ----------------------------- ABC-XYZ | ABC | None | ------------------------------ ABC-XYZ |...
2
votes
1answer
502 views

PandaSQL very slow

I'm currently switching from R to Python (anconda/Spyder Python 3) for data analysis purposes. In R I used to use a lot R sqldf. Since I'm good at sql queries, I didn't want to re-learn data.table ...
3
votes
0answers
417 views

Python pandasql sqldf giving “too many SQL variables” when number of values (RxC) in dataframe is over 999

Background: I'm ultimately trying to join two tables that reside in completely different databases. I've loaded the data into dataframes with the intent of executing a SQL join through sqldf (both ...
0
votes
1answer
250 views

Multiprocessing Pandas SQL

I'm getting a table not found error when I utilize multiprocessing on chunked Pandas dataframe in each processor started by the mp library. I'm using pandasql library for SQL in the following manner: ...
0
votes
2answers
201 views

Joining two tables using pandasql

I am running SQL queries in Python using Pandasql. The queries x, y and z work fine but u gives an error x = pysql("select * from flight f left join iata i on f.ORIGIN = i.IATA;") y = pysql("select *...
0
votes
1answer
170 views

Error with query in pandasql

I am very new to PandaSQL and have never used it before. Here is my code up until now: import pandas as pd from pandasql import sqldf import numpy as np tasks = pd.read_csv("C:/Users/RMahesh/...
1
vote
0answers
43 views

How to iterate over pandas dataframe row in python and create new column based on values present in row

I have a patient data frame and sample looks like this. pat_id drug date lag_date lag_grace_dte 1234 A 1-Jan-17 1-Jan-17 11-Jan-17 ...
2
votes
0answers
750 views

Pandas Group by a Column and take the non null values in different rows for each column [duplicate]

I have an interesting question for which I am not getting any built-in solution in Pandas. I have data like: A B C D E F 1 null null Red null 12 1 89 76 Red null null 1 null ...
0
votes
2answers
1k views

Group by and Sum in Pandas without losing columns

I have a Dataframe that looks like this: -------------------------------------------------------------------- |TradeGroup | Fund Name | Contribution | From | To | | A | Fund_1 ...
0
votes
1answer
26 views

covert sql query output to json in python

I want the output of the query to be consumed by charts js. They require json/array format for values and labels. I am new to python and pandas. My code - import pandasql as pdsql str="""select ...
0
votes
0answers
31 views

How In Import Hex number into a Pandas Data Frame

I am using Pandas 0.22.0, Numpy 1.14.0 with Python 3.6 and I am importing data from MS SQL, one of the fields holds an OID value of 0x537AA051BEBB01A2 when I read this in to a data frame in Pandas it ...
-1
votes
1answer
223 views

i have an error in using pandasql .my table is not getting identified

error: PandaSQLException: (sqlite3.OperationalError) no such table: aadhaar_data [SQL: 'select registrar,enrolment_agency from aadhaar_data limit 50;'] (Background on this error at: http://sqlalche....
0
votes
2answers
144 views

How do I get today's date in SQLite?

I am trying to use Pandasql to query my dataframe. However, it is giving me an empty dataframe even though I know it shouldn't. I think it is because I have used today's date incorrectly in the where ...
0
votes
0answers
101 views

How do I use today's date in the where clause when using Pandasql in Python?

I am struggling to use some SQL functions in Pandasql as it seems that the same names aren't used for the functions. For example, using Pandasql in Python I type: q = """ select e.Date, e.SITA, e....
0
votes
0answers
181 views

pandasql int too large to convert error

I have data like the example data below. When I run the code below to get the count of distinct Product ID by prod cat, I’m getting the error below. Pandas groupby with nunique doesn't seem to have ...
0
votes
0answers
94 views

Both Cython and Numba, Pandasql sqldf select statement throws sqlite3.OperationalError: no such table

I am really new to Python programming. I have a dataframe pandasql query which runs fine when I run my code with the standard Python3 implementation. However, after cythonizing it, I always get the ...
2
votes
1answer
1k views

filter pandas data on specific index

I'd like to filter a dataframe based on specifics index. I've read things about query but I don't succeed. Here is the code which create my pivot table. I'd like to filter on specific members df ...