Questions tagged [pandas]

Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data-science libraries in Python.

0
votes
0answers
8 views

Map Reduce to process dataframes pairwise in Pandas

I have an Elasticsearch database, and it has lots of timeseries data. I would like to load this data into memory in batches, put it in a Dataframe, do some operations, and then reduce this data with ...
0
votes
0answers
8 views

Pandas: What is dtype = <U64, and How Do I Convert it to String?

I have a table, and one column is loaded as np.str from csv. But the dtype says this weird U64 (I guess meaning, unsigned int 64 bit?) and converting with astype doesn't work. stringIDs = ...
0
votes
0answers
9 views

Data of file exported to Excel and CSV varying

I am exporting a dataframe to an Excel as well as a CSV file. Certain columns have data in the format of integers. These values are being shown as integers in Excel and the dataframe output. But, they ...
0
votes
0answers
5 views

Python Resample and Interpolate within a group

I have a data set which contains samples at the 1 second level from workout data (heart rate, watts, etc.) The data feed is not perfect and sometimes there are gaps. I need to have the dataset at 1 ...
-1
votes
0answers
20 views

Python: numpy int64 object has no attribute values

I am using pandas module in python to read in a csv-file: myData = pd.read_csv('File.csv', sep=';') Timestamp = myData.iloc[1:,0].values.tolist() - myData.iloc[0,0].values.tolist() Values = myData....
0
votes
0answers
15 views

Find Maximum abs. value for each group of row index, Arrange max. values diagonally in matrix, non diagonal values as per indexes,find determinant

I am new to Python. I want to find the largest values from all the columns for group of same row indexes (i.e. 5 to 130, beignning with 5), and also show its row and column index label in output. ...
0
votes
0answers
17 views

how do I select one (or several) dates in a dataframe

My dataframe info is below. I would like to create another dataframe selecting only dates=1997-5. In SAS this would be done using "where" command... Can you please help? <class 'pandas.core.frame....
0
votes
0answers
14 views

Line graph not propper in python bokeh after using group by

Line graph not propper in python bokeh after using group by the code is newData = data.groupby([data.OrderDate.dt.year, 'Category'])['Sales'].sum().reset_index() df = newData #pd.DataFrame....
2
votes
2answers
26 views

Binning a column of float values into strings with pandas

This code was working until I upgrade my python 2.x to 3.x. I have a df consisting of 3 columns ipk1, ipk2, ipk3. ipk1, ipk2, ipk3 consisting of float numbers 0 - 4.0, I would like to bin them into ...
-1
votes
2answers
26 views

Perform one-hot encoding on pandas dataframe on multiple column types

So I have a pandas dataframe where certain columns have values of type list and a mix of columns of non-numeric and numeric data. Example data dst_address dst_enforcement fwd_count ... 1 1.2....
1
vote
3answers
17 views

How to plot multiple categorical data using scatter plot assigning different color?

I have two columns of categorical variables and I want to plot each of the columns against same x-axis. for example for the following csv file, I want to plot type and assign color according to ...
1
vote
1answer
13 views

Iterate through pandas DataFrameGroupBy object to create yearly images with monthly subplots

I have a pandas DataFrame with datetime index of hourly wind speed and direction. My timeseries covers 31 years and I need to make yearly images of monthly windroses. This translates into 31 images ...
1
vote
1answer
20 views

How to group date and with other coloumn in pandas

How to use group by to date and category with date by year and category which has 3 and who's sales amount should be sum for each year. I have tried using groupby and its not worked out this is ...
1
vote
1answer
13 views

How to sort MultiIndex level by number of rows in the child level

I have historical data of quantity and Amount (how much was charged in the transaction) for items sold by a company to many different customers. I am looking to do some time series analysis on this ...
0
votes
1answer
13 views

load csv and delete \r\n python

I can not load my csv with pandas correctly, I have some matrices and vectors but I get a (\ r \ n) for example i tried this: test = pd.read_csv('test.csv',sep='\t',index_col=0) test.head() and i ...
0
votes
1answer
15 views

how can I select data in a multiindex dataFrame and have the result dataFrame have an appropriate index

I have a multiindex DataFrame and I'm trying to select data in it base on certain criteria, so far so good. The problem is that once I have selected my data using .loc and pd.IndexSlice, the ...
0
votes
0answers
17 views

Expanding a list in a pandas dataframe cell to form new rows [duplicate]

I have a data frame that looks vaguely like this: import pandas as pd d = {'A':[1,2,3], 'B':['X','Y','Z'], 'C':[['m','n','o'],['p','q'], ['r']]} df = pd.DataFrame(d) df A B C 0 ...
1
vote
2answers
38 views

Pandas DataFrame: How to neatly select data based on value in particular column?

For a DataFrame, I want to select rows based on the value of certain columns, e.g. for a data frame: import pandas as pd d = {'category': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'c'],...
0
votes
1answer
13 views

Merge 2 dataframes with similar time indexes

I have 2 dataframes, ts1 and ts2. The data structure looks like this: Date Close 0 2004-08-05 0.0 1 2004-08-06 -155.0 2 2004-08-09 -140.0 3 2004-08-10 -2.0 4 2004-08-11 -24.0 ...
1
vote
1answer
29 views

Subset pandas timeseries dataframe from looping if statement

Please let me know if the title of my problem is accurate - I think I need a looping if statement to solve the problem below - I am a newbie to Python and programming in general, so don't know if the ...
1
vote
2answers
31 views

Groupy by first column and display as column

I looked around for quite a while and can't seem to find the right answer. I am trying to group by first column (name), then display the result as column. Any help will be appreciated. I am new to ...
0
votes
0answers
22 views

Pandas apply for a function that returns two values for two different dataframes

I try to apply a function that does a transformation of a Series and also returns a number that allows to do the same transformation for other data. So I try to transform the column in df_ec and ...
0
votes
0answers
14 views

How do I use a Pandas Styler object in a PowerPoint presentation?

I'm trying to streamline a process at work and ran into a problem with using Pandas Styler objects and PowerPoint. Bottom line, the recipients don't want a Jupyter Notebook or even an HTML version of ...
2
votes
2answers
47 views

how to create a pie chart from csv file using python

I have this CSV data file, I'm trying to make a pie chart using this data I'm a beginner in python and don't understand how to create a pie chart using the three columns, please help! working ...
-1
votes
2answers
38 views

Pandas forward fill values N times

I have a dataframe with 1 and 0 like the following (see below for full reproducible dataframe): 2019-04-12 05:15:00 0 2019-04-12 05:30:00 1 2019-04-12 05:45:00 0 2019-04-12 06:00:00 1 2019-04-12 ...
1
vote
1answer
15 views

Accessing different columns from DataFrame in transform

I want to write a transformation function accessing two columns from a DataFrame and pass it to transform(). Here is the DataFrame which I would like to modify: print(df) date increment 0 ...
-1
votes
1answer
19 views

how to convert column of object datatype to int64 datatype in python pandas

I merged two tables and hence a column has a dtype object, I need to convert it to int, where I am getting a type error. I used astype function, but the object is not converted to string, getting the ...
1
vote
3answers
25 views

Can't drop rows with empty list whilst taking the mean of the other lists

I have a time series df that has 2 columns. I am attempting to drop all the empty lists from yearly_cost columns whilst taking an average of the lists containing floats to create a singular value for ...
0
votes
1answer
28 views

How do I read all excel files with pandas and save them in different Dataframes

I have following situation: I have a folder with different xlsx-files and want to safe all the xlsx-files in different dataframes. So for each files one dataframe. After that I want to iterate the ...
0
votes
2answers
29 views

How to fill the Null for duplicate records in Pandas

I have a df that contains snapshots for JIRA ticket status, df contains multiple snapshots for these tickets hence there are some duplications. I want to fill the null values (as long as the id has ...
1
vote
1answer
17 views

iterate over pandas dataframe and create another dataframe with repititive records

I have a dataframe act with columns as ['ids','start-yr','end-yr']. I want to create another dataframe timeline with columns as ['ids','years']. using the act df. So if act has fields as ids ...
0
votes
1answer
50 views

Fast loading and querying data in Python

I am doing some data analysis in Python. I have ~15k financial products identified by ISIN code and ~15 columns of daily data for each of them. I would like to easily and quickly access the data given ...
0
votes
0answers
22 views

How to prevent convertation of float to string when importing Dataframe to Google Sheets?

I'm trying to send pandas DataFrame to Google Sheets with pygsheets. All values have defined data type. For example when I convert dataframe to_excel, excel reads floats as floats. But when i "send" ...
-2
votes
0answers
19 views

Perfectly running script in python 3.7 gives different results each time in python 3.5. How is this possible?

I had developed a script in python 3.7 which is working fine, the same script has to be migrated to python 3.5. In python 3.5 it does run, but gives a different result each time its run. The script ...
0
votes
0answers
17 views

Append row from list stored in column [duplicate]

I have DataFrame with list in one column and would like to append each list member as a new line. Here an example: import pandas as pd df = pd.DataFrame({'a': ['fruit','vegetable'], 'b': [['orange', '...
0
votes
1answer
25 views

How can build a dataframe for time-series data with clear time-stamps?

For my experiment, I have a formatted csv file which looks like a matrix[NxM] where N = 40 total number of cycles(time-stamps) and M = 1440 pixels. For every cycle, I have 1440 pixel values ...
0
votes
2answers
26 views

How to efficiently match values from 2 series and add them to a dataframe

I have a csv file "qwi_ak_se_fa_gc_ns_op_u.csv" which contains a lot of observations of 80 variables. One of them is geography which is the county. Every county belongs to something called a Commuting ...
0
votes
1answer
26 views

Equivalent of pandas.Series.unique() for non-hashable elements

I would like to know if there is an equivalent for pandas.Series.unique() when the series contains non-hashable elements (in my case, lists). For instance, with >> ds XTR ...
0
votes
4answers
29 views

Avoid for loop to set column values from other columns in pandas

I want to assign in a new columns called 'new_col' a csv like string of other columns'values. Currently I do as follows : df['new_col'] = (df['a'].map(str) + ',' + df['b'].map(str)) This works ...
2
votes
2answers
20 views

Renaming columns on slice of dataframe not performing as expected

I was trying to clean up column names in a dataframe but only a part of the columns. It doesn't work when trying to replace column names on a slice of the dataframe somehow, why is that? Lets say we ...
4
votes
1answer
18 views

How to determine the end of a non-NaN series in pandas

For a data frame df = pd.DataFrame([[np.nan, 3.0, 7.0], [0.0, 5.0, 8.0], [0.0, 0.0, 0.0], [1.0, 3.0, np.nan], [1.0, np.nan, np.nan]], columns=[1, 2, 3], index=pd.date_range('...
0
votes
1answer
21 views

Convert time series data from csv to netCDF python

Main problem during this process is the code below: precip[:] = orig Produces an error of: ValueError: cannot reshape array of size 5732784 into shape (39811,144,144) I have two CSV files, one of ...
0
votes
0answers
30 views

How to store a pandas dataframe in the smallest format possible?

There is a lot of documentation on the most efficient way to store pandas dataframes (e.g. How to store a dataframe using Pandas), but most of the resources focus on i/o time efficiency. I would like ...
0
votes
1answer
15 views

Improve Harmonic Mean efficiency in Pandas pivot_table

I'm applying harmonic mean from scipy.stats for aggfunc parameter in Pandas pivot_table but it is much slower than a simple mean by orders of magnitude. I would like to know if this is excepted ...
1
vote
2answers
33 views

Split a Range of Numbers into different Rows - Pandas

I have a dataframe having column values like this: num_range id description '5000-6000' 1 lmn '6100-6102' 1 lmn '6363-6363' 3 xyz 'Q7890-Q8000' 2 ...
0
votes
1answer
14 views

bokeh line graph for 3 lines

I have a data of various year and months want to display in 3 lines graph based on category and X axis will be (jan,feb.......dec) Y axis sales. I am confused how to do this as I am new to Bokeh and ...
1
vote
2answers
37 views

How to set values for several columns without loop

How can I set values for several columns without loop? df.loc[:, ['test2', 'test3']] = 0 or df[['test2', 'test3']] = 0 I expect it to set values 0 in columns 'test2' and 'test3', but it returns ...
5
votes
4answers
55 views

Delete rows preceeding and following a row containing NaN in Python?

I am trying to clean experimental data using python with numpy and pandas. Some of my measurements are implausible. I want to remove these measurements and the 2 preceeding and 2 following ...
1
vote
1answer
14 views

Join dots in scatter plot with lines throws an error

I've been following this post in order to connect points in a scatter plot with lines, the written code is: import pandas as pd import matplotlib.pyplot as plt #data exploration data = pd.read_csv("...
2
votes
1answer
17 views

Expand list column on combinations and keep other data

How to unnest (explode) a column in a pandas DataFrame? I believe this question is not a duplicate of the one listed above. I am trying to find the combination of the cells in a column and create two ...