I'm building a file management app and need to know the best approach for dealing with folders

softwareengineering.stackexchange https://softwareengineering.stackexchange.com/questions/400565

  •  03-03-2021
  •  | 
  •  

Pregunta

So you open up file explorer on your computer and can create folders, add files, rename, etc. I'm doing the same thing in a web application that interacts with the file system on the server.

I'm somewhat early on in the project and I'm concerned about my approach on the folder structure.

Right now, I'm using a guid to represent the folder, and the real filename to represent the file, although I'm securing it by appending .resources to the end to prevent IIS serving it.

So for example, if the visual structure in the app is like this:

Root
 - Parent A
 - - Child 
 - - - Grandchild
 - Parent B

In the real folder system it looks like this:

Root
 - a94a6d29-e896-4490-b2db-84bcd2c63c06 <- Parent A
 - asdf334f-e896-4490-b2db-84bcd2c63c45 <- Child
 - a94a5f29-e896-4490-b2db-asdfasdf344s <- Grandchild
 - f333ffdd-e896-4490-b2db-84bcd2c63c06 <- Parent B

I've run into issues in the past with folder/filename or path lengths being too long, so trying to circumvent that with this approach. Then I started to think about performance. What if they end up with a massive amount of folders overall, nested "visually" in a neat fashion but on the server there are hundreds of thousands of folders.

If this could be potentially problematic then perhaps instead of a guid for the path I should nest them and just use a numeric ID structure of some sort.

Is this approach correct or should I change paths and nest them on the server as well?

¿Fue útil?

Solución

If this could be potentially problematic then perhaps instead of a guid for the path I should nest them and just use a numeric ID structure of some sort.

Your file system is much less of a tree than it appears to be. Much like how you are emulating a tree structure with an underlying flat list, your OS's file system mostly does the same thing.

I'm not going to delve into the specifics here, but as a basic example: if you take a folder (with all of its content) and put it inside (what used to be) a sibling folder, the data on the disk does not get moved around. The data storage is mostly independent of the nested structure you choose to use.

In reality, your OS (at least the OSes I am familiar with) does not care about how many folders are in the same parent folder. For all practical non-extreme applications, as far as your OS is concerned, the end result is exactly the same whether you nest your folders or not.

File systems use nested structures because people (not machines) like neat little boxes. You've already discovered this as you're emulating a tree structure yet your code cares little for using that same structure behind the scenes.

perhaps instead of a guid for the path I should nest them

You already know which parent folder they belong to (otherwise you'd never be able to reconstruct the user's folder tree). I'm not quite sure what the issue is here. It seems like you've identified an issue and have the information at your disposal to fix it. What specifically are you asking, other than a "go ahead"?

If you are not sure yet if there is a problem to solved, read on. If you are already sure that the flat list is not the way to go, then stop reading and starting solving.

The rest of this answer is a summary of considerations. I cannot answer it for you, but I can highlight what the consequences of your decisions are (if you nest or don't nest).

Do your admins care about neat little boxes?

Any human who has to work on the server's file system will generally prefer nested structures. However, it's possible you do not intend for this directory to be accessed by humans at any rate, which makes this consideration null and void.

The fact that you're using guids as folder names suggests to me that you do not expect human readability. If you do expect it, definitely don't just use guids without further context. Your admins would lose their marbles trying to compare guids all day just to find the right directory.

Do you have a limit on nesting levels?

Older OSes used to have a path character limit (IIRC Windows XP and before had a 256 character limit on the full file path). While that problem is less of a problem nowadays, (limits were increased), it can be an issue if you're using an OS with a practical limit when your users want to nest things very deeply.

By using a flat list behind the scenes, you actually avoid hitting the limit by keeping the folder paths of roughly the same length (root folder path + guid + filename) even though the users still get the benefit of a nested list.

Do you perform operations on a folder and all of its subfolders?

If you do, that means you need to reconstruct the folder with all of its descendants before you can carry out this operation. Depending on your implementation, this may be a considerate impact on the HDD

If your folder (on the file system) contains its parent (or child) information, that means you must fetch a folder before you can fetch its parent (or children). Due to the recursive nature of your (pretend) nested file system, that means that you have to repeatedly hit the storage device to reconstruct the parent folder with all its children.

If, however, your parent/child relations are stored elsewhere and you already know all involved folders before actually looking at the folder contents on the file system, then you are able to fetch all necessary folders and reconstruct them in-memory.

Comparatively, if you had nested your folders on the server's file system, you would be able to fetch the entire nested structure with a single command or you'd be able to simply call a file system command which executes for the folder and all of its subdirectories (at least for all OSes I'm familiar with)

Licenciado bajo: CC-BY-SA con atribución
scroll top