Skip to content Skip to sidebar Skip to footer

Attributeerror: 'dataframe' Object Has No Attribute 'path'

I'm trying incrementally to build a financial statement database. The first steps center around collecting 10-Ks from the SEC's EDGAR database. I have code for pulling the relevant

Solution 1:

Regarding the error: os.path.join. Not pd.path.join. You are calling the wrong module.

That being said, your code is not doing what you are trying to do regardless of the error. folder_name will not update for each row. You could do row.cik to get the value for each iterrows()

dir = os.path.join(folder_path, row.cik)

Solution 2:

It is relatively unclear what you're working towards accomplishing, particularly with .csv files and Pandas. The code you have seems to have a lot of curious errors in it, which I think might be ameliorated by going back to learn some of the more simple Python concepts before trying something as difficult as web-scraping. Note I don't mean to give up, rather than building up the fundamentals is a necessary step in this type of project.

That said, if I'm understanding your intent correctly, you want to create a file hierarchy for 10-K, 10-Q, etc. filings for several CIKs.

There shouldn't be any need to use .csv files, or pandas for this.

Probably the simplest way to do this would be to do it in the same step you download them.

Pseudocode for this would be as follows:

for cik in list_of_ciks:
     first_file = find_first_file_online();

     if first_file is10-K:
          save_to_10-K folder forCIKif first_file is10-Q:
          save_to_10-Q folder forCIK

As I said above, you can skip the .csv file (Also, note that CSV stands for "comma-separated-value." Some of the entries in your data contain commas, e.g. "4Less Group, Inc." This is incompatible with a CSV file, as it will split the single entry into two columns on the comma, shifting all of your data one column).

When you process the data, you'll want to build the folders as you go.

When you iterate through a new CIK, create the master folder for that CIK. When you encounter a 10-K, create a folder for 10-K's and save it with a unique name. Since you need to use the accession numbers to get the excel sheets, that wouldn't be a bad naming convention to follow.

It would be something like this:

import requests
import pathlib

cik_list = [cik_1, cik_2... cik_n]

for cik in cik_list:
     file = requests.get("cik/accession/Report.xlsx").data

     with open(pathlib.Path(cik, report_type, accession_number + ".xlsx", "wb")) as excel_file:
     excel_file.write(file.data)

The above code will definitely not run, and does not include everything you would need to make it work, since that information has been written by you. Integrating the above concepts into your code is up to you.

To reiterate, you have the CIK, the accession number, and the report type. To save the files in folders, you need only create the folders as you go, with the form "CIK/report_type/accession.xlsx"

Post a Comment for "Attributeerror: 'dataframe' Object Has No Attribute 'path'"