Questions tagged [data-science]

Data science concerns extracting knowledge or insights from data, in whatever shape or form. It can contain predictive analytics and usually takes a lot of data wrangling. Do consider posting in the https://datascience.stackexchange.com/

0
votes
0answers
2 views

How to implement fusion layer technique in pytorch?

Currently, I'm working on creating image colorization model. I want to use in it fusion layer, presented by Iizuki et al., but I have some problem with implementing it in Pytorch. The basic idea is ...
-1
votes
0answers
9 views

How to create a front end for collecting user data for making a tree? [on hold]

I'm making a tree based application to ask users certain questions about car insurance, based on their inputs.I have people who are working in this field who are ready to give me question inputs if ...
0
votes
0answers
19 views

How to decide between categorical and Discrete columns

I am currently working on the Boston competition and I'm at the stage of refining my features. I've gathered, what I presumed to be, categorical and discrete columns and placed them in their ...
0
votes
0answers
14 views

How to classify time series trends into 2 groups: “contain seasonality” and “doesn't contain seasonality”

I'm optimizing prediction model for time series data trends. Each trend may have seasonality effect or may not. I want to classify each trend into one of the following groups: "seasonality" or "no ...
-2
votes
0answers
20 views

Is there a difference between data and value?

When I were doing my undergraduate I learnt data as raw fact and does not have meaning - and nothing basic than it exist. Recently, in the book "Think Python, How to think like a Computer Scientist", ...
0
votes
1answer
7 views

How can i select feature for a prediction model using caret for categorical variable?

I found caret package in R is very helpful to see the importance variables for modeling. But, i have all categorical variables in my dataset, in this case 'varImp' command returns variable importance ...
1
vote
1answer
12 views

Adding dictionary with unique keys to DataFrame without unique keys

I am trying to do descriptive statistics of a DataFrame using GroupBy, and put those values back into the DataFrame. My DataFrame contains a non-unique running number which identifies a person (...
0
votes
0answers
13 views

Dataframe exported is not showing the complete data, since the str length of the column is 2400

I am trying to export a dataframe from python where the one column has str length of 2400. However, after exporting it to the csv file, the data is incomplete. Kindly help
1
vote
2answers
18 views

filtering a Pandas dataframe by one column and getting the sum of values in another column

I have a dataframe with multiple columns(8-10) and one such column is the year column.i have another column called the arrival column. the year column consists of data from 3 years- 2018,2019 and 2020....
-2
votes
1answer
11 views

How to make word cloud for each cluster in kmeans

"I trying to print data points in each cluster using word cloud and my data points is vectorizer data(BOW),How to print words in each cluster using word cloud..?" I already done optimal k for k-means ...
0
votes
1answer
17 views

I want to plot AUC wrt to depth of decision tree but with min_samples_split value changing

I want to plot the train auc and cv auc w.r.t depth change in decision tree model but min_samples_split value changing as shown in the code . If i fix the value of min_samples_split = 5 or 10 . then ...
-1
votes
0answers
21 views

matplolib plot not displayed after running code successfully

Able to run the below code successfully but plot not displayed as expected. Appreciate if someone can help. def plot_rolling(df): fig, ax = plt.subplots(3) ax[0].plot(df.index, df.data1, label='x') ...
0
votes
0answers
7 views

Detailed Description when hovering over a point in poinplot Using Python

I have a point plot graph with weeks as x-axis and Scores as the y-axis. When hovering over a point on that graph I want a pop up where my observations will come off. Is it possible ? fig,ax1= plt....
0
votes
1answer
28 views

Calculating Quantiles based on a column value?

I am trying to figure out a way in which I can calculate quantiles in pandas or python based on a column value? Also can I calculate multiple different quantiles in one output? For example I want to ...
-1
votes
0answers
23 views

Grace data processing and analyzing using R [on hold]

How to process and analyse Grace (Gravity recovery and climate experiment) data using R, for terrestrial hydrology monitoring.
0
votes
1answer
50 views

Handle missing values : When 99% of the data is missing from most columns (important ones)

I am facing a dilemma with a project of mine. Few of the variables don't have enough data that means almost 99% data observations are missing. I am thinking of couple of options - Impute missing ...
0
votes
0answers
8 views

Interpreting the result of the seasonal decomposition method in scipy

I have done seasonal decomposition of the data, but i cant figure out what the residuals mean, How do you interprete the residuals from a seasonal decomposition to gain helpful insights from its.
0
votes
0answers
21 views

Binning - Equal Frequency: Boundaries and Intervalls

I'm currently learning Binning methods but I'm struggling with the equal-frequency Binning. When learning, I had the following example which I don't understand clearly. The dataset looked like this: ...
0
votes
2answers
46 views

Python Pandas - Concat two data frames with different number of rows and columns

I have two data frames with different row numbers and columns. Both tables has few common columns including "Customer ID". Both tables look like this with a size of 11697 rows × 15 columns and 385839 ...
-2
votes
0answers
20 views

Is there any machine learning modelt that can predict handwritten texts accurately? If so where can i get the code for the same?

I'm in desperate need to build a machine learning model that can detect the hand written texts. Could someone please suggest me a best available model with the code to implement in my project? I have ...
1
vote
3answers
41 views

How to play around with JSON date format?

I have a JSON date data set and trying to calculate the time difference between two different JSON DateTime. For example : '2015-01-28T21:41:38.508275' - '2015-01-28T21:41:34.921589' Please look ...
0
votes
1answer
28 views

"How to fix: 'only integers, slices (`:`), ellipsis (`…`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices'?

I'm trying to predict heart disease of patients using liner regression algorithm in machine learning and I have this error(only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer ...
0
votes
0answers
23 views

logistic regression:LinAlgError: Singular matrix

Performing logistic regression on data so that I could predict who has reponded to the company, but I am getting error as 'singular matrix' import statsmodels.api as sm logit = sm.Logit(train['...
0
votes
1answer
32 views

R loop through the independent variables in lm function

I am having a problem with building lm function based on many independent variables in for loop. 14 different independent variables (x1, x2, x3 ..., x14) are created in each for loop and as a result ...
0
votes
1answer
27 views

Cannot open a csv file

I have a csv file on which i need to work in my jupyter notebook ,even though i am able to view the contents in the file using the code in the picture When i am trying to convert the data into a ...
0
votes
3answers
53 views

How to read a CSV file every other row

how do I take from a CSV file data every 2 rows? For example if I have a file that looks this 0 1 0 23 34 1 45 45 2 78 16 3 110 78 4 48 14 5 76 23 6 55 33 7 12 13 8 18 76 how can ...
0
votes
1answer
26 views

Sentiment Intensity Analyzer

I am getting 4 values for each row in sid.polarity_scores(row) as i want. But for each row i want 1st value of each row to go to 1st empty list formed respectively and 2nd value of each row to go to ...
1
vote
0answers
19 views

How do I create multiple dataframes according to the values in a column? (big data)

I have a dataframe with 30 thousand rows and 15 columns. One of the columns is called "Account" and specifies each account used. Many rows for example have the value "A" and "B" but it is impossible (...
1
vote
2answers
22 views

mongodb - transform an array of objects (key: 'keyname', value:'value') into fields named 'keyname' with corrresponding value

The current structure of my mongodb documents is: { "_id": "5c9376110a32bd172c0c5a28", "timestamp": 1553168075444, "content": [ { "name": "temperature_x", "value": 2 }, {...
-1
votes
1answer
29 views

How to design a tree to ask questions to make a decision?

I'm trying to make a program that will ask a series of questions so that it returns a suggestion at the end. How could I do this? I tried using trees, but could not make it properly. For example, ...
-4
votes
0answers
33 views

What approach should I take to model forecasting problem in machine learning? [on hold]

I have a dataset which contains 4000k rows and 6 columns. The goal is to predict travel time demand of a taxi. I have read many articles regarding how to approach the problem. So, every writer tell ...
0
votes
0answers
14 views

Does Python's datatable package support out-of-memory datasets?

datatable is a relatively fresh high performance DataFrame/data.table alternative for Python. The datatable documentation states: It focuses on: big data support, high performance, both in-memory ...
0
votes
0answers
26 views

Joining a table on a column that needs to be casted in order to be joined on

I am trying to join on a column that needs to be converted or cast as varchar to match that same column in a different table. But the way I am trying here I get an error of '<>' cannot be applied ...
0
votes
1answer
19 views

How to use computational results of a CSV as search terms in Python/Pandas?

First off, in my real situation I handel much bigger data sets, but here for this minimal, reproducible example (reprex) let's assume: I have two .csv files. They look like this: File 1 is called "...
-3
votes
0answers
17 views

Create a bar plot where each manufacturer is on the y axis and the h eight of the bars depict the number of cereals manufactured by them

Create a bar plot where each manufacturer is on the y axis and the h eight of the bars depict the number of cereals manufactured(m_name) by companies But problem is i don't have any value Q1 :- ...
0
votes
0answers
48 views

reverse naive bayes (which feature is the most likely cause for response variable)

I'm working with Time series data of sales, where I have 5 products A, B, C, D, E, and total revenue is a summation of revenue of all 5 products. My goal is 1) predict what will be my total revenue ...
0
votes
1answer
19 views

Aliasing a table in a window function?

I am trying to alias a table in a window function, but not sure what I am doing wrong as when I alias it gives error that the columns cannot be resolved SELECT e.city, e.time, e.day,...
0
votes
1answer
23 views

Making both day-first and month-first dates in a csv file day-first

I have a csv file that has a column of dates. The dates are in order of month - so January comes first, then Feb, and so on. The problem is some of the dates are in mm/dd/yyyy format and others in dd/...
0
votes
1answer
23 views

LSTM Algorithm Produces Same Results for all Inputs

So, I am currently working on a machine learning algorithm problem pertaining to car speeds and angles, and I'm trying to improve upon some of my work. I recently got done with an XGBRegressor that ...
-1
votes
1answer
33 views

Is there a way to take the values from one column in a dataframe and append them to different dataframe's column in pandas python

I'm working with 2 dataframes A & B of different shapes Dataframe A has 193 rows and 33 columns Dataframe B has 2 rows and 196 columns I want to be able to take a column from Dataframe A "...
-4
votes
0answers
25 views

Is there a way to suggest the object is not fitting perfectly in a video? [closed]

I have a requirement to build an application to inspect automobiles. I am trying to build it in python which can suggest in which direction the user has to move the frame(tablet/mobile device) in ...
0
votes
0answers
35 views

ValueError: could not convert string to float: When reading .csv dataset

Why do I keep on getting this error? I suspect it is because get_dummies does not work for categorical data? Should I label encode my data first? Any help is appreciated import pandas as pd from ...
1
vote
1answer
52 views

Using cosine similarity for classifying documents

I have a set of files for five different categories and most of them are not labelled correctly.Objective is to predict the correct category of the file whenever the same is uploaded.I used cosine ...
-3
votes
1answer
57 views

How to group Column Data with like Name to find Sum, min, and max?

I'm importing a csv file that contains transposed data. The data has columns in the following format: AC1,AC2,AD1,AD2,BP1,BP2,CT1,CO1,CO2,CS1,etc What I've been hoping to accomplish is to group ...
0
votes
0answers
33 views

LSTM Producing Same Predictions for any Input

So, I am currently working on a machine learning algorithm problem pertaining to car speeds and angles, and I'm trying to improve upon some of my work. I recently got done with an XGBRegressor that ...
0
votes
1answer
28 views

NLP Text classification Based on User comments

I am new to the machine learning and wanted to work on this problem statement. I have got some of the user comments about products and based on those comments, my model should summarize and give me ...
-2
votes
1answer
37 views

How to decode geohash using python in pandas?

I need code to decode geohash in python. There's a column which contains geohashes. I need them decoded into latitude and longitude.
0
votes
1answer
18 views

What is the best approach to implement Time series forecasting to predict future customer orders?

I have 2 years of historical data of customers, items ordered and the numbers of orders. Based on this data, I am trying to predict the future sales at customer - item level. I tried ARIMA model ...
0
votes
0answers
8 views

Tensorboard filter/query by metric

After running a search grid for several days/weeks, Tensorboard looks something like this: Pretty hard to make sense of it as is. How do folks analyze their models in a case like this, where you have ...
3
votes
2answers
64 views

Keras MLP classifier not learning

I have a data like this there are 29 column ,out of which I have to predict winPlacePerc(extreme end of dataframe) which is between 1(high perc) to 0(low perc) Out of 29 column 25 are numerical data ...